Monday, 2022-07-18

*** ysandeep is now known as ysandeep\|lunch		08:38
opendevreview	Merged openstack/openstack-ansible stable/wallaby: Bump OpenStack-Ansible for Wallaby https://review.opendev.org/c/openstack/openstack-ansible/+/849799	09:20
jrosser	^ hopefully this means we can now merge things on xena	09:50
noonedeadpunk	oh, yes, I believe we should be able now	10:16
*** ysandeep\|lunch is now known as ysandeep		10:23
*** dviroel_ is now known as dviroel		11:35
jrosser	centos-8 on xena looks pretty broken	13:14
jrosser	https://paste.opendev.org/show/bbxpY7ZJU1fIKdA9w4HO/	13:16
noonedeadpunk	ok, so they're dropping some version with time from that repo. damn	13:18
noonedeadpunk	that really does suck	13:18
jrosser	yeah it's just not there any more https://cloudsmith.io/~rabbitmq/repos/rabbitmq-erlang/packages/?q=version%3A24.%2A-1.el8&page=3	13:23
spatel	jrosser centos-8 isn't end of life?	13:42
noonedeadpunk	I bet we were talking about Stream which is not	14:13
spatel	make sense	14:30
spatel	jrosser noonedeadpunk i have created blog for ovn deployment using OSA - https://satishdotpatel.github.io/openstack-ansible-multinode-ovn/	14:34
spatel	I will add more troubleshooting scenario in coming days..	14:34
jrosser	spatel: so do all gateways go to the highest priority chassis or can they be spread?	14:40
jrosser	like not DVR, but if you have N "network nodes" for example	14:40
spatel	They always go to high priority gateway in active-standby config	14:50
spatel	Lets say if i set priority manually then last one automatically be active one.	14:50
jrosser	thats a bit sad as i think the current L3 agent spreads the active ones around	14:51
spatel	How?	14:52
jrosser	well it's keepalived ultimately	14:52
spatel	We are talking about virtual router here for tenant how can you setup active active router?	14:52
jrosser	yes	14:52
spatel	If you setup DVR with ovn then yes.. each compute node will be your routers and vms traffic will go out directly from that gateway	14:54
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible-rabbitmq_server stable/xena: Sync RedHat erlang version https://review.opendev.org/c/openstack/openstack-ansible-rabbitmq_server/+/850233	14:55
jrosser	spatel: well thats kind of not what i mean	14:56
noonedeadpunk	But then you need to pass public vlan to each compute I guess?	14:56
jrosser	DVR can be wasteful of external IP and you need the public network everywhere	14:56
spatel	noonedeadpunk you that is correct..	14:56
spatel	jrosser not in OVN based DVR	14:56
spatel	OVN base DVR doesn't waste public IP :)	14:57
mgariepy	i don't think you need an ip on the public net for it to work only the l2 needs to be there for the network.	14:57
spatel	all the magic happened inside openflow	14:57
spatel	mgariepy yes just need public VLAN connectivity	14:58
spatel	legacy DVR waste public IPs for each compute node but in OVN it doesn't.	14:59
noonedeadpunk	I really eager to test vpnaas patch as well as bgp implementation with OVN....	15:00
spatel	I am on it.. to deploy BGP based OVN (i am stuck in devstack, causing issue to deploy stack)	15:01
spatel	Thinking to deploy OSA instead of devstack	15:01
spatel	beautify of OVN is you can buy good smartnic for dedicated network node and offload ovs on nic to boost performance for network node	15:02
spatel	beauty*	15:03
jrosser	^ do you actually make this work?	15:03
spatel	smartnic ?	15:03
jrosser	yes	15:03
spatel	looking for sponsor :(	15:03
jrosser	anyway - regarding L3 HA this suggests that the active routers are not always the same chassis https://docs.openstack.org/neutron/latest/admin/ovn/routing.html#l3ha-support	15:03
jrosser	though surprising choice to have each compute node hit all the gateways constantly with BFD	15:04
jrosser	thats going to scale interestingly	15:04
* jrosser old enough to remember cisco 6500 with not enough CPU power to do BFD on all the ports concurretly. that got interesting if you tried to.....		15:05
mgariepy	lol	15:06
mgariepy	didn't your friendly cisco support expert helped you with that ?	15:06
jrosser	oh well we had people who knew better than to try it	15:08
spatel	are you concern about BFD to run on all compute nodes :)	15:08
jrosser	an people unfortunately who didnt	15:08
*** dviroel is now known as dviroel\|lunch		15:08
jrosser	spatel: well it's maybe just surprising from an architecture POV - you have hundreds of compute nodes dont you?	15:08
noonedeadpunk	I'm personally concernd on passing public net to each compute node...	15:09
jrosser	^ this	15:09
jrosser	I dont / wont do that	15:09
jrosser	though i would love to see offloaded L3 agent actually working	15:09
noonedeadpunk	Oh yes	15:10
spatel	noonedeadpunk its trad off performance / high availability or security :) being public cloud company i can understand	15:10
spatel	in our case we are running private cloud and need performance as much as possible in zero downtime.	15:11
jrosser	i think my concern with BFD is how little packet loss you'd need to fail out a gatway node	15:14
jrosser	becasue thats the point, to give extremely fast failover	15:14
jrosser	and the cpu in the gateway node is handling both control plane and data plane, some data plane overload would break the control plane	15:15
jrosser	which it totally different to how a hardware router would deal with it	15:15
*** ysandeep is now known as ysandeep\|out		15:17
admin1	no one else getting => galera_server : Fail if galera_cluster_name doesnt match provided value when doing upgrades ( minor also major ) ?	15:57
admin1	i seem to always get it	15:57
spatel	jrosser I am sure you can control BFD packet rate per second/minute etc.. dead timer/hold timer, you can isolate host CPU or ovs threads to specific CPU for better control and not overload	16:05
spatel	admin1 post full error.. i believe i have seen it	16:06
admin1	running again now .. will post once i hit the error	16:11
admin1	https://gist.github.com/a1git/a2368b36dd8465f13c829c2354515cfc	16:12
*** dviroel_ is now known as dviroel		16:15
spatel	admin1 mostly that means means cluster is not happy	16:17
admin1	but the cluster is happy , all is in sync , the name is good	16:22
spatel	did you query cluster name in DB?	16:28
spatel	that playbook try to match db stored name with file stored name.. i may need to check that task to understand	16:29
admin1	also during upgrade, some process creates folders in the /var/lib/mysql like #tmp and tmp.xxxxx which is not a valid database names (wich appears as database names)	16:45
spatel	hmm	16:49
admin1	ansible galera_container -m shell -a "mysql -h localhost -e 'show variables like \"%wsrep_cluster_name%\";'" - all 3 return openstack_galera_cluster	16:54
jrosser	admin1: there are fixes for that #tmp stuff	17:05
jrosser	you need to look at the patches we merged for that and if you are using them	17:05
spatel	admin1 i always set this in my user_variables.yml :) i know its default but still i do galera_cluster_name: openstack_galera_cluster	17:06
admin1	i am upgading from 24.x latest to 25.0.0 --	17:12
jrosser	early adopter :)	17:13
admin1	someone has to :)	17:13
jrosser	https://github.com/openstack/openstack-ansible-galera_server/commit/ebc0417919fcedd924fa5a21107055a433eca6f6	17:14
jamesdenton	also upgrading... running into an issue in lxc_hosts, seems ca-certificates needs to be installed in ubuntu-20-amd64... https://paste.opendev.org/show/bsvKILJ5V3woJvVHVkma/	17:16
jamesdenton	verifying that theory now	17:16
jrosser	interesting	17:18
spatel	jamesdenton i have notice that in 20.04.1 version but if you have ubuntu 20.04.4 you should be ok.. but i believe OSA by default doing it when it run lxc_hosts	17:20
jrosser	ca-certificates is certainly installed in the lxc image https://github.com/openstack/openstack-ansible-lxc_hosts/blob/c679877abaaf4b8449c05def5e4f3969ebf2dd65/vars/debian.yml#L42	17:20
jrosser	but if somehow that decides to use https (which is kind of shouldnt) you would be in a chicken/egg situation	17:20
jamesdenton	i think it is chicken/egg, but for a different reason. i think ca-certificates is needed before pkg.osquery.io repo can be added	17:44
jamesdenton	https://paste.opendev.org/show/bOl1SeK5Q6wykAutjLwH/	17:44
jrosser	you might need some Acquire::https::repo.domain.tld::Verify-Peer "false"; / Acquire::https::repo.domain.tld::Verify-Host "false"; in the hosts apt.conf to make that work	17:48
jrosser	that will be copied into the lxc cache before the prep script is run https://github.com/openstack/openstack-ansible-lxc_hosts/blob/c679877abaaf4b8449c05def5e4f3969ebf2dd65/vars/debian.yml#L24	17:49
jrosser	though it's ugly	17:49
jrosser	alternative is to locally mirror (or reverse proxy) the osquery repo at an http endpoint	17:50
jrosser	it's a bit tricky - as we can't make any assmumptions about what the host prep has done with /etc/apt/.... so just copy the whole lot to the container base image	17:52
jamesdenton	or.. https://paste.opendev.org/show/btmSPKASGeF7ZPKJ2kNH/... Line 16 :D	18:10
jamesdenton	aka i just installed ca-certificates higher in debian_prep.sh, before the apt update	18:40
jamesdenton	i guess lxc_cache_prep_pre_commands could be used	18:41
jamesdenton	spatel this is 20.04.4, so not sure what's different	18:51
spatel	very odd.. i had same issue last week with 20.04.1 but later when i deploy osa with 04.4 had no issue	18:55
jrosser	jamesdenton: does that work even when creating the container cache from nothing? I guess there is sufficient repo configuration from debootstrap	19:39
jrosser	though I think one of the reasons the apt config is copied in early is to account for any mirrors or proxies defined on the host	19:40
*** tosky_ is now known as tosky		19:45
admin1	quick check .. in one of my controller, i have like 15k threads .. if you run a busy controller, how many threads do you guys see and also not be bothered about it	20:05
spatel	admin1 what are those threads?	20:17
spatel	nova/neutron blah..	20:17
admin1	spatel https://gist.github.com/a1git/319e4b591ab18b26fa5892f0ab7e4c72	20:20
spatel	looks ok to me.. mostly when i deploy multiple roles on single server then individually set worker to not overkill box	20:24
spatel	by default OSA do math with number of cpu core time foo to set workers	20:25
spatel	i mostly start with 2 worker and then add more if i need more..	20:25
spatel	neutron_rpc_workers: 4	20:26
spatel	example	20:26
admin1	ok	20:37
*** dviroel is now known as dviroel\|out		21:39

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!