*** ysandeep is now known as ysandeep|lunch | 08:38 | |
opendevreview | Merged openstack/openstack-ansible stable/wallaby: Bump OpenStack-Ansible for Wallaby https://review.opendev.org/c/openstack/openstack-ansible/+/849799 | 09:20 |
---|---|---|
jrosser | ^ hopefully this means we can now merge things on xena | 09:50 |
noonedeadpunk | oh, yes, I believe we should be able now | 10:16 |
*** ysandeep|lunch is now known as ysandeep | 10:23 | |
*** dviroel_ is now known as dviroel | 11:35 | |
jrosser | centos-8 on xena looks pretty broken | 13:14 |
jrosser | https://paste.opendev.org/show/bbxpY7ZJU1fIKdA9w4HO/ | 13:16 |
noonedeadpunk | ok, so they're dropping some version with time from that repo. damn | 13:18 |
noonedeadpunk | that really does suck | 13:18 |
jrosser | yeah it's just not there any more https://cloudsmith.io/~rabbitmq/repos/rabbitmq-erlang/packages/?q=version%3A24.%2A-1.el8&page=3 | 13:23 |
spatel | jrosser centos-8 isn't end of life? | 13:42 |
noonedeadpunk | I bet we were talking about Stream which is not | 14:13 |
spatel | make sense | 14:30 |
spatel | jrosser noonedeadpunk i have created blog for ovn deployment using OSA - https://satishdotpatel.github.io/openstack-ansible-multinode-ovn/ | 14:34 |
spatel | I will add more troubleshooting scenario in coming days.. | 14:34 |
jrosser | spatel: so do all gateways go to the highest priority chassis or can they be spread? | 14:40 |
jrosser | like not DVR, but if you have N "network nodes" for example | 14:40 |
spatel | They always go to high priority gateway in active-standby config | 14:50 |
spatel | Lets say if i set priority manually then last one automatically be active one. | 14:50 |
jrosser | thats a bit sad as i think the current L3 agent spreads the active ones around | 14:51 |
spatel | How? | 14:52 |
jrosser | well it's keepalived ultimately | 14:52 |
spatel | We are talking about virtual router here for tenant how can you setup active active router? | 14:52 |
jrosser | yes | 14:52 |
spatel | If you setup DVR with ovn then yes.. each compute node will be your routers and vms traffic will go out directly from that gateway | 14:54 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-rabbitmq_server stable/xena: Sync RedHat erlang version https://review.opendev.org/c/openstack/openstack-ansible-rabbitmq_server/+/850233 | 14:55 |
jrosser | spatel: well thats kind of not what i mean | 14:56 |
noonedeadpunk | But then you need to pass public vlan to each compute I guess? | 14:56 |
jrosser | DVR can be wasteful of external IP and you need the public network everywhere | 14:56 |
spatel | noonedeadpunk you that is correct.. | 14:56 |
spatel | jrosser not in OVN based DVR | 14:56 |
spatel | OVN base DVR doesn't waste public IP :) | 14:57 |
mgariepy | i don't think you need an ip on the public net for it to work only the l2 needs to be there for the network. | 14:57 |
spatel | all the magic happened inside openflow | 14:57 |
spatel | mgariepy yes just need public VLAN connectivity | 14:58 |
spatel | legacy DVR waste public IPs for each compute node but in OVN it doesn't. | 14:59 |
noonedeadpunk | I really eager to test vpnaas patch as well as bgp implementation with OVN.... | 15:00 |
spatel | I am on it.. to deploy BGP based OVN (i am stuck in devstack, causing issue to deploy stack) | 15:01 |
spatel | Thinking to deploy OSA instead of devstack | 15:01 |
spatel | beautify of OVN is you can buy good smartnic for dedicated network node and offload ovs on nic to boost performance for network node | 15:02 |
spatel | beauty* | 15:03 |
jrosser | ^ do you actually make this work? | 15:03 |
spatel | smartnic ? | 15:03 |
jrosser | yes | 15:03 |
spatel | looking for sponsor :( | 15:03 |
jrosser | anyway - regarding L3 HA this suggests that the active routers are not always the same chassis https://docs.openstack.org/neutron/latest/admin/ovn/routing.html#l3ha-support | 15:03 |
jrosser | though surprising choice to have each compute node hit all the gateways constantly with BFD | 15:04 |
jrosser | thats going to scale interestingly | 15:04 |
* jrosser old enough to remember cisco 6500 with not enough CPU power to do BFD on all the ports concurretly. that got interesting if you tried to..... | 15:05 | |
mgariepy | lol | 15:06 |
mgariepy | didn't your friendly cisco support expert helped you with that ? | 15:06 |
jrosser | oh well we had people who knew better than to try it | 15:08 |
spatel | are you concern about BFD to run on all compute nodes :) | 15:08 |
jrosser | an people unfortunately who didnt | 15:08 |
*** dviroel is now known as dviroel|lunch | 15:08 | |
jrosser | spatel: well it's maybe just surprising from an architecture POV - you have hundreds of compute nodes dont you? | 15:08 |
noonedeadpunk | I'm personally concernd on passing public net to each compute node... | 15:09 |
jrosser | ^ this | 15:09 |
jrosser | I dont / wont do that | 15:09 |
jrosser | though i would love to see offloaded L3 agent actually working | 15:09 |
noonedeadpunk | Oh yes | 15:10 |
spatel | noonedeadpunk its trad off performance / high availability or security :) being public cloud company i can understand | 15:10 |
spatel | in our case we are running private cloud and need performance as much as possible in zero downtime. | 15:11 |
jrosser | i think my concern with BFD is how little packet loss you'd need to fail out a gatway node | 15:14 |
jrosser | becasue thats the point, to give extremely fast failover | 15:14 |
jrosser | and the cpu in the gateway node is handling both control plane and data plane, some data plane overload would break the control plane | 15:15 |
jrosser | which it totally different to how a hardware router would deal with it | 15:15 |
*** ysandeep is now known as ysandeep|out | 15:17 | |
admin1 | no one else getting => galera_server : Fail if galera_cluster_name doesnt match provided value when doing upgrades ( minor also major ) ? | 15:57 |
admin1 | i seem to always get it | 15:57 |
spatel | jrosser I am sure you can control BFD packet rate per second/minute etc.. dead timer/hold timer, you can isolate host CPU or ovs threads to specific CPU for better control and not overload | 16:05 |
spatel | admin1 post full error.. i believe i have seen it | 16:06 |
admin1 | running again now .. will post once i hit the error | 16:11 |
admin1 | https://gist.github.com/a1git/a2368b36dd8465f13c829c2354515cfc | 16:12 |
*** dviroel_ is now known as dviroel | 16:15 | |
spatel | admin1 mostly that means means cluster is not happy | 16:17 |
admin1 | but the cluster is happy , all is in sync , the name is good | 16:22 |
spatel | did you query cluster name in DB? | 16:28 |
spatel | that playbook try to match db stored name with file stored name.. i may need to check that task to understand | 16:29 |
admin1 | also during upgrade, some process creates folders in the /var/lib/mysql like #tmp and tmp.xxxxx which is not a valid database names (wich appears as database names) | 16:45 |
spatel | hmm | 16:49 |
admin1 | ansible galera_container -m shell -a "mysql -h localhost -e 'show variables like \"%wsrep_cluster_name%\";'" - all 3 return openstack_galera_cluster | 16:54 |
jrosser | admin1: there are fixes for that #tmp stuff | 17:05 |
jrosser | you need to look at the patches we merged for that and if you are using them | 17:05 |
spatel | admin1 i always set this in my user_variables.yml :) i know its default but still i do galera_cluster_name: openstack_galera_cluster | 17:06 |
admin1 | i am upgading from 24.x latest to 25.0.0 -- | 17:12 |
jrosser | early adopter :) | 17:13 |
admin1 | someone has to :) | 17:13 |
jrosser | https://github.com/openstack/openstack-ansible-galera_server/commit/ebc0417919fcedd924fa5a21107055a433eca6f6 | 17:14 |
jamesdenton | also upgrading... running into an issue in lxc_hosts, seems ca-certificates needs to be installed in ubuntu-20-amd64... https://paste.opendev.org/show/bsvKILJ5V3woJvVHVkma/ | 17:16 |
jamesdenton | verifying that theory now | 17:16 |
jrosser | interesting | 17:18 |
spatel | jamesdenton i have notice that in 20.04.1 version but if you have ubuntu 20.04.4 you should be ok.. but i believe OSA by default doing it when it run lxc_hosts | 17:20 |
jrosser | ca-certificates is certainly installed in the lxc image https://github.com/openstack/openstack-ansible-lxc_hosts/blob/c679877abaaf4b8449c05def5e4f3969ebf2dd65/vars/debian.yml#L42 | 17:20 |
jrosser | but if somehow that decides to use https (which is kind of shouldnt) you would be in a chicken/egg situation | 17:20 |
jamesdenton | i think it is chicken/egg, but for a different reason. i think ca-certificates is needed before pkg.osquery.io repo can be added | 17:44 |
jamesdenton | https://paste.opendev.org/show/bOl1SeK5Q6wykAutjLwH/ | 17:44 |
jrosser | you might need some Acquire::https::repo.domain.tld::Verify-Peer "false"; / Acquire::https::repo.domain.tld::Verify-Host "false"; in the hosts apt.conf to make that work | 17:48 |
jrosser | that will be copied into the lxc cache before the prep script is run https://github.com/openstack/openstack-ansible-lxc_hosts/blob/c679877abaaf4b8449c05def5e4f3969ebf2dd65/vars/debian.yml#L24 | 17:49 |
jrosser | though it's ugly | 17:49 |
jrosser | alternative is to locally mirror (or reverse proxy) the osquery repo at an http endpoint | 17:50 |
jrosser | it's a bit tricky - as we can't make any assmumptions about what the host prep has done with /etc/apt/.... so just copy the whole lot to the container base image | 17:52 |
jamesdenton | or.. https://paste.opendev.org/show/btmSPKASGeF7ZPKJ2kNH/... Line 16 :D | 18:10 |
jamesdenton | aka i just installed ca-certificates higher in debian_prep.sh, before the apt update | 18:40 |
jamesdenton | i guess lxc_cache_prep_pre_commands could be used | 18:41 |
jamesdenton | spatel this is 20.04.4, so not sure what's different | 18:51 |
spatel | very odd.. i had same issue last week with 20.04.1 but later when i deploy osa with 04.4 had no issue | 18:55 |
jrosser | jamesdenton: does that work even when creating the container cache from nothing? I guess there is sufficient repo configuration from debootstrap | 19:39 |
jrosser | though I think one of the reasons the apt config is copied in early is to account for any mirrors or proxies defined on the host | 19:40 |
*** tosky_ is now known as tosky | 19:45 | |
admin1 | quick check .. in one of my controller, i have like 15k threads .. if you run a busy controller, how many threads do you guys see and also not be bothered about it | 20:05 |
spatel | admin1 what are those threads? | 20:17 |
spatel | nova/neutron blah.. | 20:17 |
admin1 | spatel https://gist.github.com/a1git/319e4b591ab18b26fa5892f0ab7e4c72 | 20:20 |
spatel | looks ok to me.. mostly when i deploy multiple roles on single server then individually set worker to not overkill box | 20:24 |
spatel | by default OSA do math with number of cpu core time foo to set workers | 20:25 |
spatel | i mostly start with 2 worker and then add more if i need more.. | 20:25 |
spatel | neutron_rpc_workers: 4 | 20:26 |
spatel | example | 20:26 |
admin1 | ok | 20:37 |
*** dviroel is now known as dviroel|out | 21:39 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!