OutBackDingo | is there a secret ansible recipe to make this all work | 00:20 |
---|---|---|
OutBackDingo | TASK [Authenticate to the cloud and retrieve the service catalog] ************************************************************************************************************************************************************************************************************************************************************* | 00:21 |
OutBackDingo | fatal: [localhost]: FAILED! => {"ansible_facts": {"discovered_interpreter_python": "/usr/bin/python3"}, "changed": false, "msg": "Cloud default was not found."} | 00:21 |
OutBackDingo | running openstack-ansible -i inventory playbooks/healthcheck-openstack.yml fails wiotht he above meesage | 00:22 |
jrosser | OutBackDingo: that is running against localhost, and would be looking for clouds.yaml or an openrc file i think | 07:09 |
jrosser | OutBackDingo: this is relevant https://github.com/openstack/openstack-ansible/blob/4d6c3a2ec743e149505e5b9c936dacee6d6d4379/releasenotes/notes/openstack-service-setup-host-f38d655eed285f57.yaml | 07:11 |
*** arxcruz is now known as arxcruz|out | 07:15 | |
jrosser | on the other hand we really only use this script in our CI tests and don't document it's use outside of that, so....... | 07:17 |
opendevreview | Merged openstack/openstack-ansible-ops master: Updated from OpenStack Ansible Tests https://review.opendev.org/c/openstack/openstack-ansible-ops/+/835696 | 08:35 |
opendevreview | Merged openstack/openstack-ansible-plugins master: Updated from OpenStack Ansible Tests https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/835727 | 08:46 |
noonedeadpunk | should we merge https://review.opendev.org/c/openstack/ansible-role-pki/+/830794 and https://review.opendev.org/c/openstack/openstack-ansible-os_keystone/+/830179/9 ? | 09:53 |
jrosser | i would certainly like the first one as it fixed some broken behaviour | 09:54 |
jrosser | second one i am maybe not confident about the IDP parts | 09:56 |
opendevreview | Merged openstack/openstack-ansible-rabbitmq_server master: Updated from OpenStack Ansible Tests https://review.opendev.org/c/openstack/openstack-ansible-rabbitmq_server/+/835728 | 09:56 |
opendevreview | Merged openstack/openstack-ansible-memcached_server master: Updated from OpenStack Ansible Tests https://review.opendev.org/c/openstack/openstack-ansible-memcached_server/+/835693 | 09:58 |
jrosser | noonedeadpunk: the keystone patch is probably OK - expect we do not test k2k and neither do i have a deployment like that | 09:59 |
jrosser | *except | 09:59 |
jrosser | so testing if the IDP changes are working is not something i have had an opportunity to do | 09:59 |
opendevreview | Merged openstack/openstack-ansible-lxc_hosts master: Updated from OpenStack Ansible Tests https://review.opendev.org/c/openstack/openstack-ansible-lxc_hosts/+/835692 | 10:03 |
noonedeadpunk | ok, gotcha | 10:10 |
opendevreview | Merged openstack/openstack-ansible-rsyslog_client master: Updated from OpenStack Ansible Tests https://review.opendev.org/c/openstack/openstack-ansible-rsyslog_client/+/835730 | 10:15 |
OutBackDingo | anyone ? openstack firewall groups inactive... shouldnt it be active | 10:31 |
OutBackDingo | cant ping instance | 10:31 |
opendevreview | Merged openstack/openstack-ansible-repo_server master: Updated from OpenStack Ansible Tests https://review.opendev.org/c/openstack/openstack-ansible-repo_server/+/835729 | 10:38 |
jrosser | OutBackDingo: I think we'd need a bit more context than that to understand what you mean | 10:41 |
OutBackDingo | so ive deployed openstack-ansible, and launched an instance, we cant ping that instance from the compute host | 10:42 |
OutBackDingo | @jrosser im nmot deeply familiar enough with openstack networking to debug this | 10:42 |
OutBackDingo | only thing i see in the ui is where it says the openstack firewall group is inactive.. not sure it even means anything related | 10:44 |
OutBackDingo | but i would expect it should be active | 10:45 |
jrosser | OutBackDingo: do you think that the compute node should be able to ping the instance? that will only be possible if you've set up the networking to enable that to happen | 10:48 |
jrosser | you would need the vm to be on a flat network which is the same as, or can route to the compute node network | 10:49 |
jrosser | or a vlan network that can do the same | 10:49 |
jrosser | and with / without a neutron router as your use-case required | 10:49 |
jrosser | or a vxlan network with a neutron router and provider network that can contact the compute node | 10:50 |
jrosser | many many possbilites, all enabled by openstack-ansible but you need to decide which is appropriate and configure it that way | 10:51 |
OutBackDingo | trying to console the vm even i get "Something went wrong, connection is closed" | 11:05 |
*** dviroel|out is now known as dviroel | 11:17 | |
admin18 | OutBackDingo, you are not supposed to ping the instance from the compute host | 12:00 |
admin18 | they should not see each other .. unless they are via a router | 12:00 |
OutBackDingo | @admin1 ok so if not, how can i 1) debug the instance console 2) validate it has internet via ping | 12:07 |
admin1 | first thing is to fix the console .. and once console is there, launch a cirros instance that allows you to login with the default user/pass and then check if you can ping the gateway from inside the instance | 12:08 |
noonedeadpunk | OutBackDingo: 2) if it's part of internal network, then you should have router to be created for that network. routers are served with neutron-l3-agents and in fact these are simple network namespaces. So you should be able to ping from there | 12:09 |
admin1 | noonedeadpunk beat me to that | 12:09 |
admin1 | you should be able to ping the internal ip of the instance via the router namespace | 12:10 |
OutBackDingo | ok, how do i login to the router instance | 12:23 |
admin1 | first check where the router is | 12:26 |
admin1 | and then ssh to that node | 12:26 |
admin1 | and then ip netns exec $namespace bash | 12:26 |
admin1 | and then you can do things inside like ip -4 a ; iptables -L -n -t nat etc | 12:27 |
OutBackDingo | so it says 10.16.64.110router_gateway | 12:28 |
OutBackDingo | "check where the router is ? meaning on what node is sitting | 12:29 |
OutBackDingo | says its active on controller-3 | 12:30 |
OutBackDingo | @admin1 ip netns exec $namespace bash ?? what defines $namespace | 12:31 |
OutBackDingo | qrouter-24bd726d-b3b3-4b9d-9447-4a5e322518d6 | 12:32 |
admin1 | yep | 12:32 |
admin1 | ssh to controll3, and then run ip netns exec qrouter-24bd726d-b3b3-4b9d-9447-4a5e322518d6 bash | 12:32 |
admin1 | then you are inside the namespace | 12:32 |
admin1 | ip -4 a ; ifconfig ; ip route show ; iptables -L -nvx -t nat .. will show you details | 12:32 |
OutBackDingo | nope nothing | 12:32 |
admin1 | it wont say "Welcome to namespace message" .. typical unix .. but echo $? is 0 :) | 12:33 |
OutBackDingo | root@controller-3:~# ip netns exec qrouter-24bd726d-b3b3-4b9d-9447-4a5e322518d6 bash | 12:33 |
OutBackDingo | root@controller-3:~# | 12:33 |
OutBackDingo | nada | 12:33 |
admin1 | yes | 12:33 |
admin1 | that is how it is | 12:33 |
admin1 | the bash is now inside the namespace | 12:34 |
OutBackDingo | oh | 12:34 |
admin1 | there is no docker/python type helper to let you know where you are | 12:34 |
admin1 | ip -4 a ; ifconfig ; ip route show ; iptables -L -nvx -t nat | 12:34 |
admin1 | those will show you the details you need to know .. | 12:34 |
OutBackDingo | anyone want to take a look i can pastebin it | 12:35 |
OutBackDingo | @admin1and should i be able to ping the instance ips from inside the router | 12:38 |
jrosser | from the network namespace you should be able to ping the internal IP of the instance | 12:41 |
jrosser | and you should also be able to ping "outward" to whatever the next hop is on your external/provider network | 12:42 |
jrosser | you could paste that stuff if you like | 12:43 |
admin1 | based on your security groups, you can or cannot ping .. | 12:45 |
OutBackDingo | ok seems i can ping the instance ips, the floating ips and the primary router interface ip | 12:45 |
OutBackDingo | but cannnot ping the listed gateway | 12:45 |
admin1 | but if you ping and then run tcpdumps, you will be able to trace the packets | 12:45 |
admin1 | so router -> instance can ping ? | 12:45 |
admin1 | then you to need trace your external network .. as in why packets are not reaching the gateway | 12:46 |
admin1 | how is the external network ? is it flat or vlan based ? | 12:46 |
OutBackDingo | vlan | 12:47 |
admin1 | linuxbridge or openvswith ? | 12:49 |
admin1 | you can tcpdump in the br-vlan interface and see if you see tagged packets leaving the physical interface | 12:50 |
admin1 | if you do, then you have to check if switch is allowing tagged and if the router can see the mac/packets in that vlan | 12:50 |
admin1 | you need to do a bit of arp/mac address hunting in the switch and router to see in which interface they appear | 12:50 |
OutBackDingo | linuxbridge | 12:51 |
admin1 | cat /proc/net/vlan/config .. check if you can see the vlan tag added to the right interface | 12:51 |
OutBackDingo | from the controller i can ping 10.16.64.1 | 12:51 |
admin1 | that is a different route | 12:51 |
admin1 | how you can ping from the controller is different from how you can ping from the namespace | 12:52 |
OutBackDingo | so seems something not typing the router interface 10.16.64.100 to 10.16.64.1 | 12:52 |
OutBackDingo | so seems something not tying the router interface 10.16.64.100 to 10.16.64.1 | 12:52 |
admin1 | i have 17 mins before a meeting .. i can help you over zoom if you can share the screen .. | 12:53 |
admin1 | can you pastebin the output of cat /proc/net/vlan/config | 12:54 |
admin1 | you should be able to see br-vlan.$TAG on br-vlan | 12:54 |
admin1 | if you see that, it means 99.% of the time, your side is OK .. and then need to check switch/router side | 12:54 |
OutBackDingo | yeah br-vlan seems wonky | 12:56 |
OutBackDingo | bond0.1696 | 1696 | bond0 | 12:57 |
OutBackDingo | br-vlan.2464 | 2464 | br-vlan | 12:57 |
admin1 | that is good | 12:57 |
admin1 | the tag is proper | 12:57 |
OutBackDingo | where in netplan br-vlan is bond0.1696 | 12:57 |
admin1 | and how it should be | 12:57 |
OutBackDingo | so whats br-vlan.2464 | 12:57 |
admin1 | you cannot add br-vlan on top of a vlan | 12:57 |
admin1 | that is your problem right there | 12:57 |
admin1 | do this | 12:57 |
admin1 | give bond0 to br-vlan | 12:57 |
admin1 | that way, it will add the right tag to the bond | 12:58 |
OutBackDingo | give bond0 to br-vlan ? | 12:58 |
admin1 | your br-vlan CANNOT be on top of a tagged interface unless you are using it as flat or vlanQ-in-Q | 12:58 |
lowercase | Correct, its Bond -> then Bridge then Vlan | 12:58 |
admin1 | in your netplan, br-vlan is what ? its bond0.1696 ? | 12:59 |
OutBackDingo | yes | 12:59 |
admin1 | make it br-vlan ; interface => bon0 | 12:59 |
admin1 | without any vlan tags | 12:59 |
admin1 | br-vlan needs to own the bond0 without any tags | 12:59 |
admin1 | nothing will break .. | 12:59 |
OutBackDingo | huh... its the same on the compute hosts also | 13:00 |
OutBackDingo | ughhh | 13:00 |
admin1 | its a simple netplan generate;apply and restarting of the l3 agents | 13:01 |
admin1 | you don't need to re-run any playbooks | 13:01 |
OutBackDingo | so it should be actually | 13:01 |
OutBackDingo | br-vlan: | 13:01 |
OutBackDingo | interfaces: | 13:01 |
OutBackDingo | - bond0 | 13:01 |
admin1 | yes | 13:01 |
admin1 | because neutron adds the tags when you create an external network .. so it will create and pass the right vlan on the bond | 13:02 |
admin1 | right now its trying to send 2464 on top of 1696 | 13:02 |
admin1 | so br-vlan is always on bond0 untagged | 13:02 |
admin1 | at most you may need to delete the router and readd the external network | 13:03 |
OutBackDingo | crap now netplan throwing an error /etc/netplan/50-cloud-init.yaml:59:15: Error in network definition: br-vxlan: interface 'bond0.1680' is not defined | 13:06 |
OutBackDingo | also note br-mgmt: is on bond0 with an ip, can i add br-vlan on bond0 also without ip | 13:07 |
opendevreview | Merged openstack/ansible-role-pki master: Refactor conditional generation of CA and certificates https://review.opendev.org/c/openstack/ansible-role-pki/+/830794 | 13:19 |
spatel | OutBackDingo i am using netplan in my cloud, here is the sample of one of infra node - https://paste.opendev.org/show/bT7L4Gw4SBtdccIJIe4g/ | 13:29 |
admin1 | OutBackDingo, you can | 13:32 |
admin1 | the way neutron uses it will be with a vlan tag | 13:32 |
admin1 | so your normal traffic and tagged traffic is diferent | 13:32 |
lowercase | OutBackDingo: here is a sample one from one of my hypervisors | 13:39 |
lowercase | https://paste.opendev.org/show/bqoF0VQU8vty3FxaO3gB/ | 13:39 |
spatel | lowercase no MTU 9000 ? | 13:46 |
lowercase | everything should be mtu 9000 | 13:46 |
lowercase | let em check | 13:47 |
lowercase | huh.... dev doesn't have mtu set to 9k, but prod does | 13:48 |
spatel | I am using MTU 9000 on only br-vxlan and br-storage. We have lots of legacy servers in DC which is configured for 1500 default so trying to avoid two kind of MTU :) | 13:48 |
spatel | +1 | 13:48 |
lowercase | Our dev environment runs on significantly older hardware so its possible there is a hardware issue preventing that. I'll look into it..... later | 13:49 |
OutBackDingo | ok something really funky, i cannot add br-vlan on bond0, netplan then says /etc/netplan/50-cloud-init.yaml:59:15: Error in network definition: br-vxlan: interface 'bond0.1680' is not defined | 14:03 |
OutBackDingo | ill pastebin this config... | 14:04 |
NeilHanlon | lowercase: possibly has to do with network config, too. jumbo frames need to be supported across the net | 14:12 |
OutBackDingo | do i need tagged vlans at all ? | 14:15 |
OutBackDingo | its a small cloud... | 14:15 |
NeilHanlon | you should separate your management, data, storage, etc planes, yes | 14:15 |
OutBackDingo | @NeilHanlon why ? | 14:16 |
OutBackDingo | logic ? | 14:16 |
OutBackDingo | what is the logic for it? | 14:16 |
lowercase | Security, isolation, and preventing from cross talk. | 14:17 |
lowercase | I also want to say that it helps with the logical layout as well. It's easier to to know where things go and how they should be "organized" on the network | 14:18 |
lowercase | Security its the number 1, howevor. | 14:18 |
NeilHanlon | bandwidth considerations as well. for small clouds you don't necessarily need more than a single interface. I suppose it's also possible to do everything on a single network without segregation, but it'd likely require significant modificaiton to the config | 14:18 |
lowercase | I almost say it would be harder to do without vlans than with lol | 14:19 |
NeilHanlon | heh, I agree | 14:19 |
NeilHanlon | OutBackDingo: this is a good read w.r.t. OSA https://docs.openstack.org/openstack-ansible/latest/user/network-arch/example.html | 14:20 |
OutBackDingo | yeah im just being asked by higher powers | 14:20 |
lowercase | Are you point lead on this project? | 14:20 |
OutBackDingo | no, im the guy trying to figure out why qrouter networking is broken | 14:21 |
jrosser | OutBackDingo: tbh it sounds like you are nearly there | 14:21 |
OutBackDingo | i get the vlan configuration, just nnot so much how they did it | 14:21 |
jrosser | host networking is always hard, just needs wrangling till it works | 14:22 |
OutBackDingo | @jrosser i agree | 14:22 |
jrosser | and every-single-time, reconfiguring a host from complex-setup-A to complex-setup-B is fraught with trouble | 14:22 |
NeilHanlon | i'll be honest setting up the initial external network/routing stuff in openstack is something I perpetually struggle with | 14:22 |
jrosser | it's quite often better to reboot into the new config than try to do complex changes in-place | 14:23 |
OutBackDingo | @jrosser yeah when netplan isont companing br-vlan is wanting to go on untagged | 14:24 |
NeilHanlon | jrosser++ on that. host network stacks are not the most... stable APIs :) | 14:24 |
OutBackDingo | and then throws br-vxlan under the bus | 14:24 |
admin1 | paste the config you have for netplan now OutBackDingo | 14:24 |
admin1 | pastebin* | 14:24 |
OutBackDingo | https://pastebin.com/DGLmeu4k | 14:26 |
OutBackDingo | personally id move br-mgmt to bond0.1680 and br-vlan to bond0 | 14:27 |
OutBackDingo | jusat reversing them | 14:27 |
OutBackDingo | earmm bond0.1696 rather | 14:29 |
admin1 | OutBackDingo, a equivalent one in one of of my cluster .. https://pastebin.com/raw/uMDHD7en | 14:29 |
NeilHanlon | Yeah, that looks weird to me. IMO br-vlan cannot be a bridge on top of a vlan interface | 14:29 |
admin1 | and OutBackDingo, you don't need to give nameservers for br-vxlan, etc | 14:29 |
admin1 | maybe the one i pasted helps you simplify yours | 14:30 |
OutBackDingo | yupp welp that broke it | 14:44 |
OutBackDingo | cant even ping it now, maas rescue mode to save the day | 14:45 |
OutBackDingo | @admin1 is this same netplan config on ALL your hosts? controler/compute and storage | 14:47 |
noonedeadpunk | So when I had single inderface, I used vlan interface for public network, and used bond0 as for br-vlan | 14:55 |
*** dviroel is now known as dviroel_ | 15:04 | |
*** dviroel_ is now known as dviroel | 15:04 | |
*** dviroel is now known as dviroel|lunch | 15:44 | |
admin1 | yes .. storage has 1 more bond for replication | 15:49 |
noonedeadpunk | (unless mgmt net is not used for that for whatever reason :)) | 15:52 |
noonedeadpunk | but yes, absolutely! | 15:53 |
OutBackDingo | @admin1meaning yes, same config on all hosts, expcept your replication bond | 15:55 |
OutBackDingo | seems im having an issue adding br-vlan on bond0 when br-mgmt is also there | 15:56 |
OutBackDingo | br-mgmt has the ip | 15:56 |
OutBackDingo | but it never comes up / cant ping it after a reboot | 15:56 |
OutBackDingo | wondering if i should move the ip to br-vlan | 15:57 |
OutBackDingo | see if that shakes it loose | 15:57 |
noonedeadpunk | why not to create another vlan for br-mgtm? | 15:59 |
noonedeadpunk | so bond0 - br-vlan, bond0.100 - public, bond0.200 - br-mgmt, bond0.300 - br-storage | 15:59 |
noonedeadpunk | or smth like that | 16:00 |
OutBackDingo | @noonedeadpunk meaning bond0.100 is just an alias on bond0 - same network interfaces | 16:02 |
OutBackDingo | theres only 1 bonded pair of interfaces in this box | 16:02 |
OutBackDingo | 2 100Gb | 16:03 |
noonedeadpunk | I think alias would be bond0:100?:) | 16:04 |
noonedeadpunk | I meant vlans | 16:05 |
noonedeadpunk | oh, wait, you can't have vlans | 16:05 |
noonedeadpunk | damn | 16:05 |
noonedeadpunk | clean forgot | 16:05 |
noonedeadpunk | then disregard | 16:05 |
noonedeadpunk | OutBackDingo: basically, you don't need br-vlan then at all | 16:05 |
OutBackDingo | yupp my issue seems to be making br-vlan happy on bond0 along with br-mgmt | 16:05 |
OutBackDingo | uhmm whys that ? | 16:06 |
noonedeadpunk | if you can't pass vlans through your switch this is smth you don't bneed for sure | 16:06 |
noonedeadpunk | and if you can pass vlans - then just make mgmt/stor/vlan as separate vlans | 16:07 |
noonedeadpunk | (as I suggested) | 16:07 |
noonedeadpunk | as for tenant networks you need to have only vxlans to be fair | 16:07 |
NeilHanlon | it would also probably help to see the network config on the switch side. it's hard to guess at how your switches are configured | 16:07 |
noonedeadpunk | and for vxlan you don't need even a bridge - it can be anything having IP address through which vxlan would be built | 16:08 |
jrosser | 2 x 100Gb withoug vlans? surely not..... | 16:10 |
OutBackDingo | @NeilHanlon not sure where switch configuration matters here, when im just trying to configure br-vlan untagged on bond0 and it doesnt work | 16:11 |
jrosser | i am pretty confused by all of this tbh | 16:14 |
jrosser | OutBackDingo: if you can ask a simple question, showing the config and the error output and it's surrounding context, things might be easier to understand | 16:14 |
admin1 | OutBackDingo, is your ports on hybrid mode ? | 16:14 |
admin1 | that they allow both tagged and untagged traffic at the same time | 16:14 |
admin1 | its configured in the switches | 16:14 |
admin1 | based on switch, you have a default pv ( vlan) and tagged vlans | 16:15 |
OutBackDingo | yes | 16:15 |
admin1 | so its not trunk or switchport, but something in middle | 16:15 |
OutBackDingo | correct | 16:15 |
admin1 | to test, you can also direct add ip on the bond0 and test it | 16:15 |
jrosser | admin1: but we struggle with netplan here, not switches? | 16:15 |
OutBackDingo | admin1: there is an ip on the bond | 16:15 |
OutBackDingo | br-mgmt: | 16:16 |
OutBackDingo | addresses: | 16:16 |
OutBackDingo | - 10.16.48.23/24 | 16:16 |
OutBackDingo | gateway4: 10.16.48.1 | 16:16 |
OutBackDingo | interfaces: | 16:16 |
OutBackDingo | - bond0 | 16:16 |
admin1 | the netplan i used similar to his use case is the one working for me .. bond0 untagged that does ssh for mgmt, ( i added ips on br-vlan) and then tags on bond0 for the api | 16:16 |
admin1 | this is on br-mgmt | 16:16 |
admin1 | so now br-mgmt is directly on bond0 ? | 16:16 |
admin1 | then your br-vlan will not work | 16:16 |
OutBackDingo | it always was directly on bond0 | 16:17 |
OutBackDingo | which is why br-vlan isnt working | 16:17 |
admin1 | ip on bond0 does not interfere with br-vlan sitting on top of bond0 | 16:17 |
admin1 | ip on bond0 or br-vlan on bond0 is untagged .. neutron adds a new tagged interface on bond0 and send traffic | 16:18 |
admin1 | so they do not interfere with one another | 16:18 |
admin1 | first thing for you to do .. remove all tags etc .. just add bond0 in the netplan and then ping your gateway .. | 16:18 |
OutBackDingo | welp something does because if i put br-vlan on bond0 ... and reobot i cant get back onto it | 16:18 |
admin1 | if ti works, then slowly add the rest | 16:18 |
admin1 | what is the netplan file that you have, before you reboot ? | 16:18 |
OutBackDingo | https://pastebin.com/9mP9hXfP | 16:20 |
NeilHanlon | OutBackDingo: the switch config will determine how you can configure your server's networking, so it does matter | 16:20 |
OutBackDingo | @admin1 ^ | 16:22 |
OutBackDingo | pastebinned it | 16:22 |
jrosser | maybe i ask a silly question, but why try to do this all through one bond when there appear to be 6 interfaces? | 16:24 |
OutBackDingo | NeilHanlon: yes i get that but none of it will matter unless i can get br-vlan on bond0 with br-mgmt also | 16:24 |
noonedeadpunk | OutBackDingo: I truly do not understand why you just not create another vlan for br-mgmt??? | 16:24 |
OutBackDingo | @jrosser i didnt design it... and im told 2 x 100gb is plenty for this small setup | 16:25 |
OutBackDingo | noonedeadpunk: i tried, didnt work | 16:25 |
noonedeadpunk | what didn't work?:) | 16:25 |
OutBackDingo | if you look at the past bin i tried to move br-mgmt to bond0.1696 | 16:25 |
OutBackDingo | which is what br-vlan was on | 16:25 |
OutBackDingo | basically reversing them | 16:25 |
jrosser | you know br-vlan kind of represents a trunk port (tagged) ? | 16:26 |
noonedeadpunk | Well, I don't see that on pastebin you provided :) And not sure what didn't work | 16:26 |
OutBackDingo | right now it seems like br-mgmt with ip is only happy on bond0 and refuses to share bond0 with br-vlan | 16:26 |
noonedeadpunk | yes, you can not have br-vlan and br-mgmt on same interface | 16:27 |
noonedeadpunk | as neutron will takeover br-vlan | 16:27 |
noonedeadpunk | But I'm pretty sure that br-mgmt on another vlan is good idea | 16:27 |
jrosser | we have an example file too https://github.com/openstack/openstack-ansible/blob/master/etc/netplan/01-static.yml | 16:28 |
noonedeadpunk | OR, you can just skip having br-vlan - do you need vlans for your tenants? | 16:28 |
noonedeadpunk | as this is why it's even existing | 16:28 |
OutBackDingo | @noonedeadpunk maybe try the previous pastbin https://pastebin.com/DGLmeu4k | 16:28 |
noonedeadpunk | in most of deployments ppl use _only_ vxlans | 16:28 |
noonedeadpunk | But what haven't worked when you set br-mgmt on bond0.1696 ?:) | 16:29 |
OutBackDingo | @noonedeadpunk correct | 16:29 |
noonedeadpunk | that was a question :) | 16:30 |
noonedeadpunk | you have changed that on all hosts? | 16:30 |
OutBackDingo | i basically tried to reverse br-vlan on bond0.1696 and br-mgmt with ip on bond0 | 16:30 |
OutBackDingo | makein br-vlan bond0 and br-mgmt bond0.1696 and rebooted | 16:31 |
OutBackDingo | and after the reboot could not ping / access the node | 16:31 |
OutBackDingo | as br-mgmt has the primary ip | 16:31 |
noonedeadpunk | ok, but you should have done that on other nodes then as well | 16:31 |
noonedeadpunk | or well, you don't have then vlan reachable / routable | 16:32 |
OutBackDingo | welp, shit! your right | 16:32 |
jrosser | imho a simple 1G port that you use for ssh aside from all this other stuff is worth a very large amount | 16:32 |
noonedeadpunk | so likely this vlan only available inside switch and not passed futher on | 16:32 |
OutBackDingo | bridges: | 16:33 |
OutBackDingo | br-mgmt: | 16:33 |
OutBackDingo | interfaces: | 16:33 |
OutBackDingo | - bond0 | 16:33 |
OutBackDingo | mtu: 9000 | 16:33 |
OutBackDingo | yupp seems the infra host i was on is on bond0 also | 16:33 |
OutBackDingo | so it wouldnt be able to talk to same network on bond0.1696 | 16:34 |
noonedeadpunk | but before changing everything, it's worth thinking if you _really_ need to provide vlans to your tenants in addition to vxlan. Likely you might want if you decide to deploy trove or octavia... | 16:34 |
OutBackDingo | therefor, can i move the ip from br-mgmt to br-vlan | 16:34 |
noonedeadpunk | I won't do that | 16:34 |
OutBackDingo | and reverse the bond0 / bond0.1696 | 16:34 |
noonedeadpunk | as I said, br-vlan would be taken over by neutron | 16:34 |
noonedeadpunk | which means it will be part of other bridge, so IP won't work basically | 16:35 |
noonedeadpunk | so in fact br-vlan should be jsut regular interface, like bond0 | 16:35 |
jrosser | ultimately you have to pass an interface to neutron, not a bridge, right? | 16:36 |
noonedeadpunk | yup | 16:36 |
noonedeadpunk | it can be bridge though | 16:36 |
noonedeadpunk | but it's quite obscure to see bridge inside bridge | 16:37 |
noonedeadpunk | or well, not bridge inside bridge, but vlan, created on top of bridge inside other bridge | 16:37 |
noonedeadpunk | so like br-vlan.1000 inside generated by neutron bridge | 16:38 |
noonedeadpunk | tons of unnecessary overhead | 16:38 |
noonedeadpunk | from other side easy to switch underlying interface | 16:39 |
OutBackDingo | so basically what im hearing is br-mgmt w/ ip and br-vlan cannot both reside on bond0 | 16:40 |
OutBackDingo | i cant move br-mgmt w/ip off bond0 to a van on say bond0.1696 unless i move literally all hosts to same vlan bond0.1696 w/ ip | 16:41 |
OutBackDingo | and then add br-vlan to bond0 on all hosts | 16:41 |
OutBackDingo | right!@ | 16:41 |
noonedeadpunk | I have no idea why you can't move br-mgmt there | 16:42 |
OutBackDingo | noonedeadpunk: i can if i do it to every single host | 16:43 |
OutBackDingo | because every single host has br-mgmt w/ ip on bond0 | 16:44 |
noonedeadpunk | It's only guess, but as I said, likely your bond0.1696 is not passed somewhere | 16:44 |
noonedeadpunk | like to the router | 16:44 |
noonedeadpunk | or not configured on the router properly | 16:44 |
noonedeadpunk | so networks that are defined there are not routable | 16:44 |
noonedeadpunk | which might be valid though if it wasn't IP you're reaching environment | 16:44 |
OutBackDingo | exactly... | 16:45 |
OutBackDingo | its not an out of band mgmt ip, its the primary | 16:45 |
OutBackDingo | there isnt a out of band ip | 16:45 |
noonedeadpunk | but well, it's doable :) I mean I literally faced same thing when was deploying my first cloud years back :) | 16:45 |
jrosser | ^ mistake :) | 16:45 |
noonedeadpunk | and yeah... | 16:45 |
OutBackDingo | @jrosseri didnt design it | 16:46 |
jrosser | i have two completely independant means to get into each server beside the openstack mgmt interface | 16:46 |
noonedeadpunk | yup, me too | 16:46 |
jrosser | reality just is that you need some options for when $real-life does something unexpected | 16:46 |
OutBackDingo | well we have 3 networks on vlans per machine | 16:46 |
OutBackDingo | but if that bond0 never comes upo for X reasons, your done | 16:47 |
jrosser | but they're all on the same interface to the same switches..... | 16:47 |
OutBackDingo | yupp | 16:47 |
jrosser | for a production environment thats kind of not good when things go wrong | 16:47 |
OutBackDingo | which is a point ill raise to the higher powers | 16:48 |
jrosser | how do you upgrade the firmware on the nic, or some other disruptive thing..... | 16:48 |
OutBackDingo | as a "design" modification | 16:48 |
OutBackDingo | @jrosserdisruptive as in rwnaming every primary bond0 to bond0.xxxx which has your primary IP | 16:49 |
OutBackDingo | i get it | 16:49 |
OutBackDingo | and can see it clearly | 16:49 |
jrosser | well yeah, if you need to reconfigure the interface in a way that tears down / rebuild the config, you can't be ssh in over the same thing | 16:49 |
OutBackDingo | and maas deployed wount let you login to ipmi console either | 16:50 |
jrosser | an option is some sort of KVM / remote screen on IPMI, but thats really for emergency as its very inflexible | 16:50 |
jrosser | like no ssh keys, no copy/paste and so on | 16:50 |
OutBackDingo | @jrosser cannot console ipmi with a maas deployment to my knowledge and login | 16:51 |
OutBackDingo | all hosts only accessable via ssh | 16:51 |
OutBackDingo | can only reboot into rescue undo the breakage and reboot | 16:52 |
jrosser | something else to feed back is the 100G NIC are really overkill | 16:52 |
jrosser | you won't be able to utilise even a fraction of that with linuxbridge | 16:52 |
OutBackDingo | personally id break the bonded pair into to separate interfaces | 16:52 |
jrosser | maybe - upgrading switch firmware has things to say about that | 16:53 |
jrosser | you can figure a lot of this out by working through what you'd do for managment / operational tasks | 16:54 |
jrosser | like replace a server / upgrade a nic firmware / deal with a broken switch | 16:54 |
jrosser | how do you keep things working enough in all those situations and still have sufficient access to things | 16:54 |
OutBackDingo | either way i can see where the deployment as far as network config goes needs a "back door" for ssh access | 16:54 |
jrosser | what must always carry on, and what is only an inconvenience if it's down | 16:55 |
jrosser | openstack-ansible has no problem with that alternative ssh access be the one that the playbooks run over | 16:55 |
*** dviroel|lunch is now known as dviroel | 17:00 | |
admin1 | OutBackDingo, does it ping if you give the ip directly on bond0 and nothing else ? remove everything else .. | 17:01 |
OutBackDingo | yes as br-mgmt is on bond0 | 17:01 |
OutBackDingo | and its primary ip | 17:01 |
OutBackDingo | then 10.16.48 network works on the interface plain dhcp even on single 100gb nic, or bridge / bond0 | 17:07 |
OutBackDingo | i guess a good logic test is move br-mgmt w/ ipto a bond.1696 vlan to 2 hosts and test their connectivity between the two hosts | 17:09 |
jrosser | sounds good - start as simple as you can and test connectivity at each step | 17:13 |
OutBackDingo | LOL i need a sed via ssh to every hoost to rewrite this file and reboot all nodes | 17:21 |
admin1 | first test it in 2 only ... | 17:50 |
admin1 | your config is wrong though .. you are adding bond0 on top of br-mgmt and br-vlan is on top of bond0.1672 and 1680 .. so your vlan will not work | 17:51 |
OutBackDingo | @admin1 uhmmm | 17:52 |
OutBackDingo | ill check that | 17:53 |
OutBackDingo | @admin1 meanwhile here is the proposed fix | 17:53 |
OutBackDingo | https://pastebin.com/6tHAJSzJ | 17:53 |
OutBackDingo | based on your netplan, with our interfaces/bonds/vlans | 17:54 |
admin1 | yes, but here you don't have your ssh ip :) | 17:56 |
OutBackDingo | @admin1 sure i do | 17:57 |
OutBackDingo | 10.16.48.23/24 | 17:57 |
OutBackDingo | last line | 17:57 |
admin1 | ok .. as long as you can reach it , that is fine | 17:57 |
OutBackDingo | br-mgmt: | 17:57 |
OutBackDingo | mtu: 9000 | 17:57 |
OutBackDingo | interfaces: [ bond0.1696 ] | 17:57 |
OutBackDingo | addresses: | 17:57 |
OutBackDingo | - 10.16.48.23/24 | 17:57 |
admin1 | now add br-vlan and give it bond0 | 17:57 |
admin1 | don't add any ip or anyhing .. just br-vlan .. interfaces bond0 .. dhcp4 false dhcp6 false | 17:58 |
admin1 | that should not bring anything down | 17:58 |
OutBackDingo | @admin1 final https://pastebin.com/LnHAZAvq | 18:01 |
OutBackDingo | hah ill add the dhcp4 / dhcp6 false | 18:02 |
admin1 | yep .. do the same in another node and test ping between the br-storage, vlan and mgmt | 18:02 |
admin1 | if they all ping fine, replicate to all nodes | 18:02 |
OutBackDingo | and no nameservers / no macs correct | 18:03 |
admin1 | macs are not needed since all ips are static | 18:03 |
admin1 | and nameservers are needed | 18:03 |
admin1 | so that apt-get update etc will work | 18:03 |
OutBackDingo | well nameservers go in /etc/resolv.conf also | 18:04 |
OutBackDingo | ahhh netplan populates it | 18:04 |
admin1 | netplan does it for you | 18:04 |
OutBackDingo | so just on br-mgmt | 18:04 |
admin1 | right | 18:04 |
admin1 | after that, run curl gw.am .. if dns and routing is good, it will return back your current outgoing public ip | 18:05 |
admin1 | when working with a lot of clouds, and to check if all is good, i wrote my own service on gw.am which shows back your ip in curl | 18:05 |
admin1 | that way, i know a vm is working fine | 18:05 |
OutBackDingo | @admin ok last one https://pastebin.com/ksg2kSCD | 18:07 |
admin1 | looks good .. i never use search domains though .. search domain maas .. | 18:08 |
admin1 | i would also add dhcp4/6 false under storage and others to prevent it looking for dhcp during bootup | 18:09 |
admin1 | so storage and vxlan also add dhcpX false | 18:10 |
jrosser | when thinking about the mtu two of these matter | 18:11 |
jrosser | the vxlan packets ideally are roughly 1500+vxlan header | 18:11 |
jrosser | and if you want best performance from some shared storage than you’d want that as 9000 | 18:12 |
jrosser | but otherwise things that need to connect outside of your deployment should stay as 1500 | 18:12 |
OutBackDingo | so add the dhcp false, and add mtu 9000 | 18:16 |
OutBackDingo | @jrosser all storage is on the nodes | 18:17 |
OutBackDingo | nothing external | 18:17 |
OutBackDingo | its all ceph | 18:17 |
OutBackDingo | and i really appreciate all the input, help, critics and insight | 18:18 |
OutBackDingo | from you all | 18:18 |
jrosser | when you get this working pay attention to where the storage traffic actually goes | 18:18 |
OutBackDingo | jrosser: meaning ? | 18:19 |
jrosser | if you follow the layout in the openstack-ansible all-in-one as a reference then you will have ceph traffic on the mgmt network | 18:19 |
OutBackDingo | all compute nodes have their own "ceph osds" and we used ceph ansible oin the nodes prior, we didnt deploy openstacks-ansibble ceph | 18:20 |
jrosser | ensure that the ceph cluster and relocation network cidr end up as you expect in the various ceph.conf | 18:20 |
jrosser | oh ok sure that’s ok | 18:20 |
OutBackDingo | @jrosser thumbs up! | 18:20 |
jrosser | make sure you reserve sufficient hypervisor memory for what ceph needs when it’s converged like this - admin1 did you ever run with osd on compute nodes? | 18:22 |
admin1 | i tested HCI .. but i have not one in production | 18:24 |
admin1 | can't properly plan resources around it .. | 18:24 |
jrosser | OutBackDingo: more experiences here ^ | 18:25 |
admin1 | if you have ceph with ec2, and and osd goes down and at the same time you end up with fairly busy instances on a 1:6 share or even 1:4 share, you see cpu spikes | 18:25 |
jrosser | I’ve got a few osd hosts with more ram than some compute nodes | 18:26 |
admin1 | plus it makes the resource planning and ratios usage calculation not exact .. like how much are you going to oversell/overuse cpu and ram | 18:26 |
jrosser | when ceph has to deal with dead disks or a big rebalance then the memory usage can be really significant | 18:27 |
jrosser | OutBackDingo: “in general” most serious deployments have kept the ceph hardware separate from compute nodes | 18:28 |
jrosser | as admin1 says, dealing with resource allocation can become very tricky | 18:28 |
OutBackDingo | small 6 node cluster each with 512 memory | 18:30 |
jrosser | amusingly the HCI vendors are now pushing this fact new model with separate storage nodes :) | 18:30 |
jrosser | *fancy new… | 18:30 |
OutBackDingo | all nodes have like 30+ TB storage in osd, + an intel nvme card | 18:33 |
OutBackDingo | so yeah small clusters | 18:33 |
OutBackDingo | and with that 130AM here, so time to sleep | 18:36 |
OutBackDingo | ill catch up tomorrow let all know how it goes | 18:36 |
admin1 | good luck | 18:40 |
admin1 | anyone confirmed yet for openstack summit berlin ? | 18:41 |
mgariepy | hmm auto patch from tests where are the rules defined ? as in https://review.opendev.org/c/openstack/openstack-ansible-tests/+/835468 this one should be pushed to all other repo. | 19:40 |
mgariepy | noonedeadpunk, jrosser ^^ | 19:41 |
jrosser | I don’t think we sync tox.ini | 19:41 |
jrosser | oh it’s setup.py…. | 19:42 |
mgariepy | also where is that job defined ? | 19:42 |
mgariepy | lol | 19:42 |
mgariepy | the sync one. | 19:43 |
jrosser | I think it’s partly in the tests repo and partly in (?)system-config | 19:43 |
jrosser | I’m still not sure that setup.py is synced | 19:45 |
jrosser | but there are a ton of our repos broken for the same thing | 19:45 |
mgariepy | nop not sync | 19:46 |
mgariepy | https://github.com/openstack/openstack-ansible-tests/blob/master/sync-test-repos.sh#L118 | 19:46 |
mgariepy | i guess that even if we do add it it won't sync back the old commits.. | 19:48 |
mgariepy | i will patch te repos in like 20 minutes let's see what we do for that one after. | 19:50 |
opendevreview | Merged openstack/openstack-ansible-plugins master: Update ssh_keypairs role to fix module for Rocky Linux 8 https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/835152 | 20:05 |
opendevreview | Marc Gariépy proposed openstack/ansible-role-python_venv_build master: Disable setuptools auto discovery https://review.opendev.org/c/openstack/ansible-role-python_venv_build/+/835892 | 20:15 |
opendevreview | Marc Gariépy proposed openstack/ansible-role-systemd_mount master: Disable setuptools auto discovery https://review.opendev.org/c/openstack/ansible-role-systemd_mount/+/835893 | 20:15 |
opendevreview | Marc Gariépy proposed openstack/ansible-role-systemd_service master: Disable setuptools auto discovery https://review.opendev.org/c/openstack/ansible-role-systemd_service/+/835894 | 20:16 |
opendevreview | Marc Gariépy proposed openstack/openstack-ansible-ceph_client master: Disable setuptools auto discovery https://review.opendev.org/c/openstack/openstack-ansible-ceph_client/+/835895 | 20:17 |
opendevreview | Marc Gariépy proposed openstack/openstack-ansible-os_aodh master: Disable setuptools auto discovery https://review.opendev.org/c/openstack/openstack-ansible-os_aodh/+/835896 | 20:17 |
opendevreview | Marc Gariépy proposed openstack/openstack-ansible-os_ceilometer master: Disable setuptools auto discovery https://review.opendev.org/c/openstack/openstack-ansible-os_ceilometer/+/835897 | 20:18 |
opendevreview | Marc Gariépy proposed openstack/openstack-ansible-os_glance master: Disable setuptools auto discovery https://review.opendev.org/c/openstack/openstack-ansible-os_glance/+/835898 | 20:18 |
opendevreview | Marc Gariépy proposed openstack/openstack-ansible-os_gnocchi master: Disable setuptools auto discovery https://review.opendev.org/c/openstack/openstack-ansible-os_gnocchi/+/835899 | 20:19 |
opendevreview | Marc Gariépy proposed openstack/openstack-ansible-os_magnum master: Disable setuptools auto discovery https://review.opendev.org/c/openstack/openstack-ansible-os_magnum/+/835900 | 20:19 |
opendevreview | Marc Gariépy proposed openstack/openstack-ansible-os_manila master: Disable setuptools auto discovery https://review.opendev.org/c/openstack/openstack-ansible-os_manila/+/835901 | 20:19 |
opendevreview | Marc Gariépy proposed openstack/openstack-ansible-os_neutron master: Disable setuptools auto discovery https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/835902 | 20:20 |
opendevreview | Marc Gariépy proposed openstack/openstack-ansible-os_octavia master: Disable setuptools auto discovery https://review.opendev.org/c/openstack/openstack-ansible-os_octavia/+/835903 | 20:20 |
* NeilHanlon mutes his email notifications for a bit :) | 20:20 | |
opendevreview | Marc Gariépy proposed openstack/openstack-ansible-os_sahara master: Disable setuptools auto discovery https://review.opendev.org/c/openstack/openstack-ansible-os_sahara/+/835904 | 20:20 |
opendevreview | Marc Gariépy proposed openstack/openstack-ansible-os_tacker master: Disable setuptools auto discovery https://review.opendev.org/c/openstack/openstack-ansible-os_tacker/+/835905 | 20:20 |
opendevreview | Marc Gariépy proposed openstack/openstack-ansible-os_trove master: Disable setuptools auto discovery https://review.opendev.org/c/openstack/openstack-ansible-os_trove/+/835906 | 20:21 |
mgariepy | lol .. sorry .. | 20:21 |
NeilHanlon | hehe no worries | 20:21 |
NeilHanlon | i needed to setup a mail filter anyways, just sorta forces the issue :D | 20:21 |
opendevreview | Marc Gariépy proposed openstack/openstack-ansible-os_zun master: Disable setuptools auto discovery https://review.opendev.org/c/openstack/openstack-ansible-os_zun/+/835907 | 20:22 |
opendevreview | Marc Gariépy proposed openstack/openstack-ansible-os_horizon master: Disable setuptools auto discovery https://review.opendev.org/c/openstack/openstack-ansible-os_horizon/+/835908 | 20:28 |
mgariepy | jrosser, not all repos had the setup.py .. .. | 20:28 |
opendevreview | Marc Gariépy proposed openstack/openstack-ansible-os_tempest master: Disable setuptools auto discovery https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/835909 | 20:30 |
mgariepy | should be the last one.. lol | 20:30 |
jrosser | mgariepy: I except we have a few different issues to look at | 20:30 |
mgariepy | probably lol | 20:31 |
mgariepy | at least the doc seems to be passing :D | 20:36 |
mgariepy | i should have addeda topic :/ | 20:39 |
mgariepy | well these are the only patch i had in review so.. | 20:39 |
NeilHanlon | hmm, is it worth me code reviewing your changes mgariepy if I can only give them a +1 ? | 21:02 |
jrosser | NeilHanlon: that is the pathway to +2, if you are interested in that…. | 21:04 |
NeilHanlon | sounds dangerous ;) | 21:14 |
*** dviroel is now known as dviroel|out | 21:16 | |
*** ianw_pto is now known as ianw | 22:24 | |
opendevreview | Marc Gariépy proposed openstack/openstack-ansible-os_tempest master: Updated from OpenStack Ansible Tests https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/835724 | 23:04 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!