Friday, 2020-11-20

*** cshen has joined #openstack-ansible		00:00
*** cshen has quit IRC		00:04
*** luksky has quit IRC		00:07
*** macz_ has quit IRC		00:12
*** tosky has quit IRC		00:18
*** gyee has quit IRC		01:01
*** cshen has joined #openstack-ansible		02:00
*** spatel has joined #openstack-ansible		02:04
*** spatel has quit IRC		02:04
*** cshen has quit IRC		02:05
*** jhesketh has quit IRC		02:27
*** jhesketh has joined #openstack-ansible		02:33
*** spatel has joined #openstack-ansible		03:08
*** macz_ has joined #openstack-ansible		03:26
*** macz_ has quit IRC		03:31
*** d34dh0r53 has quit IRC		03:48
*** cshen has joined #openstack-ansible		04:00
*** cshen has quit IRC		04:05
*** spatel has quit IRC		05:31
*** evrardjp has joined #openstack-ansible		05:33
*** rh-jlabarre has quit IRC		05:40
*** alvinstarr has quit IRC		05:42
*** cshen has joined #openstack-ansible		06:01
*** cshen has quit IRC		06:05
*** yasemind34 has joined #openstack-ansible		06:23
*** cshen has joined #openstack-ansible		06:25
*** cshen has quit IRC		06:30
*** rpittau\|afk is now known as rpittau		06:52
*** pto has joined #openstack-ansible		06:53
*** miloa has joined #openstack-ansible		07:06
openstackgerrit	Siavash Sardari proposed openstack/openstack-ansible-openstack_openrc stable/ussuri: Adding support of system scoped openrc and clouds.yaml https://review.opendev.org/763508	07:08
*** pto has quit IRC		07:24
*** pto_ has joined #openstack-ansible		07:24
*** pto has joined #openstack-ansible		07:26
*** pto_ has quit IRC		07:29
*** luksky has joined #openstack-ansible		07:56
*** pcaruana has joined #openstack-ansible		08:03
openstackgerrit	Dmitriy Rabotyagov (noonedeadpunk) proposed openstack/openstack-ansible-os_ceilometer master: Unify deployment of ceilometer files https://review.opendev.org/762183	08:06
*** cshen has joined #openstack-ansible		08:16
openstackgerrit	Dmitriy Rabotyagov (noonedeadpunk) proposed openstack/openstack-ansible-os_neutron master: Add centos-8 support for ovs-dpdk https://review.opendev.org/762729	08:26
*** andrewbonney has joined #openstack-ansible		08:28
*** cshen has quit IRC		08:31
*** fanfi has quit IRC		08:56
*** pto has quit IRC		09:00
*** pto_ has joined #openstack-ansible		09:01
*** pto has joined #openstack-ansible		09:09
*** pto_ has quit IRC		09:11
*** cshen has joined #openstack-ansible		09:12
noonedeadpunk	gixx: oh, also I dunno if you're using lbaas, but you need to do migration to octavia if you do. Also octavia upgrade from Q to U might not work well, at least they claim to support upgrade only for sequential releases https://docs.openstack.org/octavia/latest/admin/guides/upgrade.html	09:36
admin0	morning noonedeadpunk .. do you have a good howto to get ocatavia going with osa ?	09:49
noonedeadpunk	admin0: only rackspace one and spatel blogpost	09:50
noonedeadpunk	https://developer.rackspace.com/docs/private-cloud/rpc/master/rpc-octavia-internal/octavia-install-guide/ https://satishdotpatel.github.io//openstack-ansible-octavia/	09:50
admin0	just curious, do they also have a kubernetes/magnum guide :) ?	09:52
admin0	this guide is a good start	09:52
noonedeadpunk	haha	09:52
noonedeadpunk	eventually the main problem with magnum is magnum itself	09:52
noonedeadpunk	as to get it working you need to chose magnum version that is working properly first	09:53
noonedeadpunk	But iirc U magnum was pretty good in general	09:53
admin0	we have 21.1.0 and 21.2.0 -- aren't they good in these tags ?	09:57
noonedeadpunk	should be I guess	09:57
*** tosky has joined #openstack-ansible		09:57
noonedeadpunk	but you need coreos image here for sure	09:58
admin0	right now, i am doing 16->18 migration for one cluster	10:05
admin0	slow process :)	10:05
admin0	thanks everyone for these notes: https://etherpad.opendev.org/p/osa-rocky-bionic-upgrade	10:06
noonedeadpunk	there's also https://docs.openstack.org/openstack-ansible/rocky/admin/upgrades/distribution-upgrades.html in case you haven't come around it	10:07
noonedeadpunk	thanks to ebbex for it:)	10:07
*** openstackgerrit has quit IRC		10:25
*** spatel has joined #openstack-ansible		10:34
*** spatel has quit IRC		10:39
*** SecOpsNinja has joined #openstack-ansible		11:05
*** jbadiapa has joined #openstack-ansible		11:29
admin0	in one osa cluster where cinder,glance,nova is using ceph, i added a 2nd ceph just for cinder .. . so i can create volumes in both .. but when i try to mount it, the volumes from the 2nd ceph cannot be mounted .. i get libvirt.libvirtError: internal error: unable to execute QEMU command 'blockdev-add': error connecting: Permission denied	11:39
admin0	the hypervisor has no idea/keys of the 2nd ceph	11:39
admin0	if anyone knows or can point me to the right direction, it would help	11:40
SecOpsNinja	hi everyone. in this project, the rsyslog cntainers shoudn't receive all the logs of lxc contaienrs and physical services? the strange part is following this https://docs.openstack.org/openstack-ansible-rsyslog_server/latest/ops-logging.html#finding-logs, the directory /var/log/log-storage doesnt exit. Im trying to trobleshotting and strange error regarding the creation of the news staying	11:51
SecOpsNinja	stuck forever in "Scheduling" but the only error that i could find was regarding nova-api-wsg that returns ERROR oslo.messaging._drivers.impl_rabbit [req-68a168f1-4e5a-49c1-95c6-6ba6d8c69377 - - - - -] Connection failed: [Errno 113] EHOSTUNREACH (retrying in 32.0 seconds): OSError: [Errno 113] EHOSTUNREACH. is there any way to try to track this req-68a1... and see where is falling?	11:51
SecOpsNinja	even http://paste.openstack.org/show/800250/ doesnt say much...	11:54
*** mgariepy has quit IRC		11:59
jrosser	admin0: why two cephs? you can put the cinder pool on specific/different devices in a single ceph if you want to, it may not be necessary to have two clusters	12:08
admin0	some biz requirement :)	12:08
admin0	i just deliver :)	12:09
jrosser	also when you boot from volume theres a no-op snapshot taken from the glance pool to cinder	12:09
jrosser	you will miss out on that	12:09
admin0	SecOpsNinja, rsyslog container is not used anymore recently if i recall .. its all journald .. you can use something like graylog to centralize logging	12:10
jrosser	admin0: https://medium.com/walmartglobaltech/deploying-cinder-with-multiple-ceph-cluster-backends-2cd90d64b10	12:11
SecOpsNinja	admin0, that was what i was thinging regarding systemd journing and rsyslog... so the rsylogs container is useless	12:11
admin0	jrosser, thanks .. i think i missed the Create a new nova-secret.xml file part .. and now thinking how to do this with osa	12:12
*** openstackgerrit has joined #openstack-ansible		12:35
openstackgerrit	Merged openstack/openstack-ansible-os_neutron master: Test OVS/OVN deployments on CentOS 8 https://review.opendev.org/762661	12:35
*** macz_ has joined #openstack-ansible		12:39
admin0	moving from 16 -> 18, i nuke a controller and rerun the playbooks with limit .. all containers are recreated and work OK, except utility .. i get this error in utility -- https://gist.githubusercontent.com/a1git/be5353eb91260945d8b00bcd21df7b68/raw/cc330d8f96267ea79a5057e5a50b7984bc72bf46/gistfile1.txt .. looks like it still tries to setup a 16.04 version one	12:41
*** macz_ has quit IRC		12:44
admin0	SecOpsNinja, something like this in variables will do https://gist.github.com/a1git/ae3934799479e18f3e9553f7bbe7c25a	12:53
openstackgerrit	Dmitriy Rabotyagov (noonedeadpunk) proposed openstack/openstack-ansible-os_neutron master: Return calico to voting https://review.opendev.org/702657	12:57
openstackgerrit	Dmitriy Rabotyagov (noonedeadpunk) proposed openstack/openstack-ansible-os_neutron master: Return calico to voting https://review.opendev.org/702657	12:58
-openstackstatus- NOTICE: The Gerrit service at review.opendev.org will be offline starting at 15:00 UTC (roughly two hours from now) for a weekend upgrade maintenance: http://lists.opendev.org/pipermail/service-announce/2020-October/000012.html		13:01
*** spatel has joined #openstack-ansible		13:11
openstackgerrit	Merged openstack/openstack-ansible master: Pin SHA of murano-dashboard so it is controlled by OSA releases https://review.opendev.org/763002	13:15
*** spatel has quit IRC		13:16
*** rh-jlabarre has joined #openstack-ansible		13:21
SecOpsNinja	thanks admin0 fpr graylog config	13:23
SecOpsNinja	atm im trying to find the cause of the errors in iscsi in my lvm backends	13:24
*** mgariepy has joined #openstack-ansible		13:33
openstackgerrit	Merged openstack/openstack-ansible master: Bump calico version https://review.opendev.org/762985	13:35
openstackgerrit	Andrew Bonney proposed openstack/openstack-ansible master: Bump API microversion required for Zun AIO https://review.opendev.org/763562	13:41
admin0	SecOpsNinja, for iscsi you can use iscsi commands to check if the server is responding .. or if you can manually mount	13:45
openstackgerrit	Andrew Bonney proposed openstack/openstack-ansible-os_zun master: DNM: Update zun role to match current requirements https://review.opendev.org/763141	13:48
*** d34dh0r53 has joined #openstack-ansible		13:54
-openstackstatus- NOTICE: The Gerrit service at review.opendev.org will be offline starting at 15:00 UTC (roughly one hour from now) for a weekend upgrade maintenance: http://lists.opendev.org/pipermail/service-announce/2020-October/000012.html		14:00
*** macz_ has joined #openstack-ansible		14:00
*** pto has quit IRC		14:01
*** pto_ has joined #openstack-ansible		14:01
openstackgerrit	James Denton proposed openstack/openstack-ansible-os_neutron master: Add centos-8 support for ovs-dpdk https://review.opendev.org/762729	14:02
*** spatel has joined #openstack-ansible		14:03
*** macz_ has quit IRC		14:05
*** pto_ has quit IRC		14:10
spatel	question folks, I have deployed my 3 node controllers and realized that my VxLAN subnet range is wrong so i want to change that subnet range. How difficult its going to change that? This is what i am thinking.	14:11
spatel	1. change range in openstack_user_config.yml	14:11
spatel	2. change IP on each controller node for VxLAN bridge	14:11
spatel	3. run neutron playbook	14:12
spatel	what do you think?	14:12
admin0	and delete containers that might use the br-vxlan range	14:15
spatel	I don't think any container use br-vxlan	14:18
spatel	only neutron-server run inside container and it doesn't care about br-vxlan	14:19
spatel	correct me if i am missing something	14:19
admin0	right .. in some very old deployments, i recall seeing neutron-agents in containers	14:20
admin0	was just mentioning as a point to check	14:21
spatel	admin0: sure good to know	14:21
SecOpsNinja	whtat is best way to manually remove a vm when nova force-delete f61e7054-2943-4302-9e1f-4883c183f090 doesnt work? i have already detached the inexisting previous attached volume	14:23
admin0	set its state to available and try to delete again	14:23
admin0	if it fails, marked it as deleted from the database	14:23
admin0	so that it does not appear in the UI /list	14:24
SecOpsNinja	i don't know how but 3 vms got in unstable mode and dispite being able to delete its vlume tought cinder the nova didn't updated the stte. Im trying to remove them from kvm compute node an trying to remove all from openstack but its getting in error state	14:26
spatel	SecOpsNinja: sometime when rabbitMQ not happy i have noticed that kind of behavior, vm got stuck in bad state and never get recover.	14:30
SecOpsNinja	spatel, yep i think that is whats is causing the instability on my cluster because i cant create new vms(being stuck in creating forever and i can only start a few ones). so im trying to remove the ghosts vms and disks and see if it resolves thjis instability	14:32
SecOpsNinja	beecause trought rabbitmq i wasn't able to understand with the request id to see what was causing the OSError: [Errno 113] EHOSTUNREACH	14:32
openstackgerrit	Merged openstack/openstack-ansible-os_tempest master: Add tempest cleanup support https://review.opendev.org/762405	14:51
*** miloa has quit IRC		14:53
*** rpittau is now known as rpittau\|afk		14:53
ThiagoCMC	spatel, my Ceph cluster have a low IOPS, I believe that it's due to the home gigalan network... lol	14:55
ThiagoCMC	Are you aware of potential fine tuning here and there, let's say, PG numbers?	14:55
spatel	Totally possible. Idle requirement is to use 10G	14:56
spatel	I mostly use this method - https://ceph.io/pgcalc/	14:56
ThiagoCMC	Yeah... True. I do have 2.5Gpbs NIC cars but I can't find a 2.5Gbps cheap switch on the market, with 16 ports, for example	14:56
spatel	ThiagoCMC: there are plenty of switch in cheap price in market	14:57
ThiagoCMC	Not with 16 * 2.5Gbps ports	14:57
spatel	how many server you have?	14:58
spatel	upgrade them to 10G nic and get 10G switch	14:58
ThiagoCMC	8	14:58
ThiagoCMC	I found a close one: https://www.anandtech.com/show/15916/at-last-a-25gbps-consumer-network-switch-qnap-releases-qsw11055t-5port-switch	14:58
ThiagoCMC	But, only 5 ports lol	14:58
admin0	ebay :)	14:58
ThiagoCMC	So sad :-D	14:58
spatel	why are you only looking for 2.5G?	14:59
ThiagoCMC	Because my NIC cars are all 2.5Gbps	14:59
ThiagoCMC	cards*	14:59
spatel	what is the model of server? HP/IBM/Dell?	14:59
admin0	for my demo lab, i have a single server with 256gb of ram and 4x nvme on raid0 .. and i use the same method like openstack .. use cloud-init and backing image to quickly spawn instances and delete them again .. with script ..	15:00
admin0	so a total up and down takes less than a minute to get any OS up and running	15:00
ThiagoCMC	It isn't a server... haha - It's an Asus ROG motherboard top gaming PC with AMD 3950X and water cooling	15:01
spatel	oh boy!	15:01
spatel	i would like to see picture of it :)	15:02
ThiagoCMC	I can definitely share it! hahaha	15:02
ThiagoCMC	At amazon: https://www.amazon.com/gp/product/B07SYW3RT2/ref=ppx_yo_dt_b_asin_title_o09_s00?ie=UTF8&psc=1	15:02
ThiagoCMC	ASUS ROG Crosshair VIII Hero X570	15:02
ThiagoCMC	It's an awesome compute node!	15:02
ThiagoCMC	LOL	15:03
-openstackstatus- NOTICE: The Gerrit service at review.opendev.org is offline for a weekend upgrade maintenance, updates will be provided once it's available again: http://lists.opendev.org/pipermail/service-announce/2020-October/000012.html		15:03
admin0	i have this same mb . no water cooling though	15:03
ThiagoCMC	Nice ^_^	15:03
spatel	one day i will get that machine.	15:03
ThiagoCMC	:-D	15:03
spatel	I have gaming PC but its ok	15:03
admin0	good setup to play cities skylines in the side and run ubuntu desktop inside vmware	15:04
ThiagoCMC	VMWare?! Ewww... lol	15:04
admin0	vmware workstation	15:04
ThiagoCMC	I have Windows on QEMU accessing the NVIDIA via GPU Passthrough lol	15:04
admin0	for (gaming) reasons, cannot switch to linux completely, but cannot work on windows desktop either	15:04
ThiagoCMC	Windows bare-metal? Never.	15:05
ThiagoCMC	:-P	15:05
admin0	so material desktop + ubuntu inside vmware workstation	15:05
admin0	allows me to be on linux and play games	15:05
admin0	laptops ( even my old mac) is linux	15:05
ThiagoCMC	Cool!	15:05
spatel	I hate window (only reason i have that to just play PUBG/RainbowSix/valorant/ :)	15:05
admin0	i am addicted to cities skylines .. just a week back , reached 90% cpu on all 16 cores .. time to upgrade to 24	15:06
ThiagoCMC	Damn LOL	15:06
admin0	my laptop is now a dedicated aio box :)	15:06
ThiagoCMC	I hate Windows too... I tried the 1- pro on those machines, man, it turns off ramdonly. I tried everything, no power saving, not screensaver, performance tuning. Nothing worked. Got Ubuntu on those babies, weeks of uptime.	15:07
spatel	Next year planning to buy new MacBook Pro with M1 chip :)	15:08
kleini	ThiagoCMC: do you have an extra SSD on Ceph OSD nodes for Ceph journal partition?	15:08
*** macz_ has joined #openstack-ansible		15:09
spatel	If you have SSD then i don't think you need dedicated journal	15:09
kleini	writes into Ceph return after they are written into the journal	15:09
noonedeadpunk	well, if journal is on nvme in some raid 5 :p	15:09
spatel	If you need more performance then you can put journal on NVMe and SSD for data	15:09
kleini	okay, so the whole disc is an SSD and journal is colocated?	15:09
admin0	https://pasteboard.co/JBfx9LY.png -- 1186 hours that could have spent on openstack :D	15:10
ThiagoCMC	kleini, yes, SSDs for Ceph dbs	15:10
kleini	okay, we have normal SATA SSDs for non-colocated journal and performance is really good	15:10
ThiagoCMC	admin0, LOLOL	15:11
ThiagoCMC	kleini, what is your network speed?	15:12
*** macz_ has quit IRC		15:13
spatel	This is my ceph network diagram - https://ibb.co/fxstpx1	15:15
spatel	i had performance issue with EVO which i have replaced with Enterprise SSD (spent $8000)	15:15
ThiagoCMC	Nice!	15:17
kleini	1G copper between compute nodes and Ceph OSDs	15:19
ThiagoCMC	:-O	15:19
ThiagoCMC	bonded?	15:19
kleini	I have full 100MB/s write speed in VMs in OpenStack	15:19
kleini	nothing bonded at all	15:19
ThiagoCMC	Hmm	15:20
kleini	plain RJ45 connection	15:20
spatel	ThiagoCMC: here is some performance comparison EVO vs PM - http://paste.openstack.org/show/800257/	15:20
ThiagoCMC	Have you tested it with `fio`, how much IOPS?	15:20
ThiagoCMC	spatel, thanks for sharing this!	15:20
spatel	noonedeadpunk: jrosser curious why do we need this anymore - http://paste.openstack.org/show/800258/ ?	15:26
spatel	br-vxlan not attached to any container in current deployment	15:26
noonedeadpunk	it doesn't need to be brdige, but you need to have interface for vxlan operation	15:26
noonedeadpunk	at least for lxb	15:26
spatel	Yes we need br-vxlan bridge interface on controller/compute but we don't need that block of code in openstack_user_config.yml right?	15:27
ThiagoCMC	True, but it doesn't have to be declared under "provider_networks:" at user config	15:27
spatel	it has no use anywhere	15:27
noonedeadpunk	it has	15:27
ThiagoCMC	I don't have the br-vxlan declared under my "provider_networks:"	15:28
spatel	where?	15:28
noonedeadpunk	since you get neutron ml2 config out of this	15:28
ThiagoCMC	There is another way	15:28
spatel	now neutron agent is outside container so that block of code doing nothing.	15:29
ThiagoCMC	For example: http://paste.openstack.org/show/800259/ - user vars	15:29
noonedeadpunk	well yes, it's true	15:29
noonedeadpunk	spatel: it defines default for vars ThiagoCMC pasted	15:30
ThiagoCMC	I also have: http://paste.openstack.org/show/800260/	15:30
noonedeadpunk	https://opendev.org/openstack/openstack-ansible-os_neutron/src/branch/master/library/provider_networks	15:30
spatel	thanks noonedeadpunk	15:33
spatel	ThiagoCMC: why are you using 224.0.0.0 range ? i think you should use 239.0.0.0 to 239.255.255.255	15:34
spatel	as a best practice (it doesn't matter but sometime it can create issue)	15:34
spatel	https://en.wikipedia.org/wiki/Multicast_address	15:35
noonedeadpunk	like for metal deployments you don't need provier_networks at all, but again - all depends.	15:35
noonedeadpunk	I prefer having itdefined and not bother myself with defining specific overrides when deployment allows to do so	15:36
spatel	noonedeadpunk: totally understand. just trying to understand how this lego pieces connected so if i pull out one..nothing should break	15:36
noonedeadpunk	well yes, if you're covered with specific definintions it can be dropped	15:36
ThiagoCMC	spatel, it was jamesdenton a long time ago that configured this for me, I was facing network problems and that fixed it!	15:37
jamesdenton	O_O	15:37
ThiagoCMC	lol	15:37
noonedeadpunk	xD	15:37
spatel	related multicast address or vxlan bridge stuff?	15:37
ThiagoCMC	I don't remember exactly what was happening with the vxlan connectivity...	15:38
spatel	jamesdenton: what i need to do if i want to deploy dedicated l3-agent metal node in OSA?	15:40
spatel	Lets say i deploy 5 dedicated node for traffic load-balancing, how does that work with l3-agent?	15:41
jamesdenton	5 network nodes?	15:41
spatel	yes	15:42
spatel	not neutron-server	15:42
jamesdenton	right, ok	15:42
spatel	just for SNAT and all high traffic volume router	15:42
jamesdenton	you would get l3 agent on each of them, but any router would only be scheduled/active to one at any given time.	15:42
jamesdenton	so make the router HA and let vrrp do its thing	15:42
jamesdenton	but over time, the distribution may not be even	15:43
spatel	so its not like all 5 node will be active and working together	15:43
jamesdenton	no	15:43
spatel	hmm	15:43
ThiagoCMC	What about when with OVS' OVN?	15:44
*** watersj has joined #openstack-ansible		15:44
spatel	I have vlan provider in my datacenter but i am thinking to run k8s cluster and it doesn't work with vlan provider (it need tenant network so i have to deploy l3-agent for them)	15:45
watersj	for those with limited nics doing bond, what mode are you using?	15:45
watersj	and or suggest?	15:46
spatel	watersj: if your switch support MLAG then active+active mode=4	15:46
spatel	layer2+3	15:46
jamesdenton	spatel OVS/DVR is better suited to distributing North-South traffic	15:47
spatel	truly speaking i never convinced myself to try DVR :(	15:48
spatel	i don't know anyone out there use DVR on large cloud	15:48
spatel	I tried and end up doing troubleshooting day night, i might be new that time but it has lots of pieces to look after	15:49
jamesdenton	it works, but there are some incompatibilties (allowed address pairs being the main one, currently)	15:49
spatel	you know what it eat up my lost of public IPs :(	15:50
jamesdenton	yes, it will do that	15:50
spatel	then finally i decided to get out from that mess and use vlan provider	15:50
spatel	i think i talked to you first time when i join this IRC 2 years ago :)	15:51
spatel	cloudnull & you :)	15:51
admin0	xenial -> bionic -- how to force a new container to be in ubuntu 18 and not 16 ?	15:51
cloudnull	wat!?	15:51
spatel	you are here :)	15:51
admin0	he is always here :D	15:51
admin0	\o/	15:51
spatel	silent majority.. haha	15:52
admin0	you just need to ping them :)	15:52
jamesdenton	Gerrit is down, so cloudnull has time to play	15:52
cloudnull	unpossible	15:52
cloudnull	there's no playing in cloud	15:52
spatel	you helped me a lot to build my first cloud on pike and its rock solid, its been 2.5 years and 99.999 SLA :)	15:52
cloudnull	how has everyone been ?	15:52
jamesdenton	no complaints	15:55
admin0	spatel, what i do is also make compute nodes as network nodes	15:55
spatel	cloudnull: waiting or COVID-19 patch released :)	15:55
admin0	that way, all traffic is distributed	15:55
admin0	and one node going down does not take away the whole	15:56
spatel	admin0: what is the advantage of doing that?	15:56
admin0	no centralized point of failures	15:56
spatel	that is called DVR deployment right?	15:56
admin0	its not using DVR per se	15:56
spatel	I have 300 compute nodes in my cloud.. that would be very odd don't you think	15:57
admin0	my biggest is 1/2 of that, but its not that odd	15:57
ThiagoCMC	With OVN, all compute nodes can easily also be network nodes	15:57
ThiagoCMC	If I'm not wrong lol - it's an efficient topology	15:58
admin0	only when ovn is approved and battle tested .. right now, this works for both lb and ovs	15:58
ThiagoCMC	Yep, I have the same topology that ovn provides but, with linux bridges	15:58
spatel	admin0: let me ask stupid question, if i turn every compute node into network node and create tenant-1 in that case it will pick 1 HA pair to handle that tenant traffic right?	16:00
admin0	if HA is enabled, yes	16:01
spatel	if HA not enable then?	16:01
admin0	then the router will be in one of the compute node in the cluster	16:01
admin0	or you can disable l3 agents on some nodes and force some compute nodes to be more dedicated to network functions	16:01
spatel	so if i have many many tenant then my workload will distributed right but if i have only 1 or 2 tenant then it doesn't make sense	16:02
admin0	right	16:02
*** macz_ has joined #openstack-ansible		16:02
spatel	Thanks for confirming that.	16:02
admin0	the ones i managed was a big public cloud provider with over 15000 tenants	16:02
admin0	so a dedicated network node in such scale did not make sense	16:02
admin0	the more distributed, the better	16:02
spatel	how many total compute node you have to handle 15000 tenants?	16:02
admin0	it grew upto 350 eventually	16:03
admin0	but i left and moved on, so no idea of current	16:03
spatel	with 3 controller node?	16:03
spatel	15000 is pretty big size :)	16:03
admin0	3 (main) controller nodes, but additional 3 to spawn up more network server instances	16:03
spatel	I have 3 controller node with 326 compute nodes.	16:04
spatel	now i am build new cloud with 6 controller node for make it more resilient	16:05
admin0	for public cloud provider, where you go into events and give a 15 day test a/c where people can come and in play, that was a good planning to handle if some customers try to create a lot of network	16:11
admin0	there were 3 main conrollers ( for haproxy ) and other services, but we had to 3 extra nodes for neutron server	16:11
admin0	neutron server was a bottleneck	16:11
SecOpsNinja	one quick question i have my cinder storage using lvm but from what im seing some volumes arne't available in cinder storage so compute node is complaning about target not existing. how can i remove this ip-...:3260-iscsi-iqn.2010-10.org.openstack:volume-aca70400-6f16-4534-a5b0-596d0b1fa5a2-lun-1 in /dev/disks/by-paths?	16:15
*** gyee has joined #openstack-ansible		16:32
ThiagoCMC	How to make use of the local storage of the compute nodes, when deploying OSA with Ceph? Right now, I see no option to launch an Instance outside of the RBD "vms" pool. Even when selection: "Create Image: no", any idea?	16:36
*** macz_ has quit IRC		16:39
*** macz_ has joined #openstack-ansible		16:40
*** pcaruana has quit IRC		16:40
*** mgariepy has quit IRC		16:41
*** watersj has quit IRC		16:48
spatel	Do you have following in nova.conf ?	16:50
-spatel- images_rbd_pool = vms		16:50
-spatel- images_type = rbd		16:50
spatel	if yes then nova default put your VM on Ceph	16:51
spatel	ThiagoCMC: ^^	16:51
kukacz	admin0: in that public cloud setup you've mentioned, with compute node = network node, which backend was used? did I understood there was HA used but no DVR?	16:54
admin0	linuxbridge	16:55
admin0	HA in routers was not used ..	16:55
kukacz	admin0: (I'm looking for inspiration what's possible and proven in production. for long years we've been on Contrail/Tungsten and lost track of ML2/OVS/...)	16:56
*** tosky has quit IRC		16:56
admin0	simple = easy to maintain, manage, grow	16:58
kukacz	admin0: thanks. using linuxbridge, is it possible to flexibly implement multiple (10th's) tenant-bound provider networks?	16:59
admin0	based on how big range you give on vxlan, it will work	16:59
admin0	but one thing i found was how vxlan is implemented in neutron	17:00
admin0	for example, in one setup i put 100000:10000000 .. something like that .. super long range	17:00
admin0	but i found that during upgrades, neutron took like 30 mins to came up	17:00
admin0	found that that the vxlan table os pre-populated by neutron without any indexing	17:00
admin0	so if you give say 1 million vxlan range, you will have a neutron table with 1 million entries	17:01
admin0	even if you are only using 10 vxlan networks	17:01
admin0	and it makes neutron come up very very slow	17:01
admin0	had to manually readjust the range in setting and truncate records	17:01
jamesdenton	yikes! sounds bug worthy	17:02
kukacz	admin0: cool :-) that's always a pain, to discover such hidden internals	17:02
jamesdenton	kukacz if i were deploying today, i would settle on ML2/OVS w/ HA routers (at a minimum) and wait for migration to OVN down the road	17:02
kukacz	admin0: that was an OSA environment?	17:03
jamesdenton	but depending on your use case, straight provider networks w/o tenant networks/routers may be better option	17:03
admin0	kukacz, if for public cloud, you can offer 2 choices - .. floating ip and direct IP	17:04
admin0	direct IP means people get direct public IP from dhcp	17:04
admin0	no need to create routers or networks	17:04
admin0	and people are happy ( cpanel, directmin, windows) for licensing purpose	17:04
kukacz	jamesdenton: thanks! we're a multitenant service provider cloud. need to deliver each customer their own routed network and bind it to a set of their projects, enable their subnet pools to be distributed via FIPs etc.	17:04
admin0	cons - if they delete the server, the IP is also gone	17:04
admin0	so in real usage, direct ip is used a lot more than floating ip .. because of the simplicity .. so no longer 3 dhcp, 2 routers per network	17:05
*** macz_ has quit IRC		17:06
kukacz	admin0: direct IP is what we're using currently, though it's due to a Tungsten Fabric limitation. customers ask for FIPs, as those are usually part of shared Ansible playbooks they use for orchestration etc.	17:07
admin0	i meant to offer both .. like net-floating net-direct etc names	17:08
admin0	so customers have a choice to have both	17:09
ThiagoCMC	spatel, that's exactly what I have at nova.conf!	17:13
kukacz	admin0, jamesdenton: thanks for your inputs!	17:13
jamesdenton	any time	17:13
spatel	kukacz: my reason to use direct IP because of performance. (NAT is big bottleneck for network throughput)	17:14
ThiagoCMC	IPv6 rocks! :-P	17:14
ThiagoCMC	spatel, so, how to put the VM's disks at the local storage at the compute node (instead of the default, ceph)?	17:15
kukacz	spatel: also using direct IPs from provider(external) networks?	17:16
spatel	ThiagoCMC: i don't think there is a way to do that you have only two choice remove rbd from nova.conf and then nova default use local disk. (Or use cinder boot volume where you can pick if you want local or cinder to boot)	17:17
admin0	kukacz, https://www.openstackfaq.com/openstack-add-direct-attached-dhcp-ip/	17:17
admin0	this range can be a vlan range provided by the datacenter .. all you need is a router .1	17:17
spatel	kukacz: this is what i have - https://satishdotpatel.github.io//build-openstack-cloud-using-openstack-ansible/	17:18
ThiagoCMC	spatel, on no! Thanks for the info, I'll researching more into this.	17:18
spatel	Router is my physical router for all my VLAN	17:18
spatel	kukacz: no NAT anywhere. high speed networking	17:19
admin0	ThiagoCMC, if you remove nova_libvirt_images_rbd_pool from user_variables and use host_overrides to only do in selected servers, you can have some servers using ceph for vms and rest using local disk	17:19
spatel	we have 100gbps traffic coming in/out which i don't think any server based network node can handle :)	17:20
ThiagoCMC	admin0, interesting, thanks!	17:22
ThiagoCMC	I found people talking about this here: https://bugzilla.redhat.com/show_bug.cgi?id=1303814	17:22
openstack	bugzilla.redhat.com bug 1303814 in rhosp-director "[RFE] ephemeral storage for multiple back end" [Medium,Closed: wontfix] - Assigned to rhos-maint	17:22
admin0	that too .. if you have big nodes and need multi gb connectivity, ovs it is .. but if you are older smaller nodes where the number of instances will not be a lot ( based on the flavors), lb	17:22
admin0	lb vs ovs has some calculations	17:22
*** MickyMan77 has quit IRC		17:23
kukacz	spatel: thanks for sharing that nice guide	17:24
*** klamath_atx has quit IRC		17:24
kukacz	admin0: our computes are typically running 50-70 instances	17:24
spatel	kukacz: 70vms wow! that is a lot of VM	17:25
admin0	:) just this morning i was trying osa+ multiple ceph clusters .. i managed to cinder to support 2nd ceph .. so i can crate volumes .. but nova does not yet support multi ceph ( it cannot even mount the volume from the 2nd ceph)	17:25
admin0	so i have a setup where users can create volumes on the 2nd ceph, but no way to mount them	17:25
kukacz	the biggest challenge is about handling (routing) the big amount of customer networks as provider networks. at best (currently) we need them BGP routed from datacenter network	17:26
admin0	ebgp with the dc ?	17:26
kukacz	admin0: bgpvpnaas is what we're considering now	17:27
admin0	how do you add networks ? is it one network with multiple ranges, or do you add a new provider with every subnet ?	17:27
*** mgariepy has joined #openstack-ansible		17:30
spatel	I am also interested to learn how people run BGP with openstack networking	17:31
kukacz	admin0: with the current contrail setup, it's just not marked as provider network. it's common neutron network which is routed by contrail via mpls tunnels towards physical edger routers	17:31
kukacz	on edge router, there's a VRF per tenant. multiple subnets usually, pairing 1:1 with openstack networks or multiple subnets per 1 openstack network, both is possible	17:33
kleini	ThiagoCMC: https://opendev.org/openstack/openstack-ansible/src/branch/master/etc/openstack_deploy/user_variables.yml.prod-ceph.example#L28 just omit this line in your user_variables.yml and your compute nodes will create ephemeral as local qcow2 images based on downloaded images from Glance/Ceph	17:34
kleini	we use that to have the fastest possible ephemeral storage. And if you want storage on Ceph, just create it as a volume. So you have choice, when spawning VM	17:35
ThiagoCMC	kleini, that's exactly what I want!!!	17:35
ThiagoCMC	I'll try it now.	17:35
ThiagoCMC	Thank you!	17:35
kleini	multiple local NVMe PCIe 3.0x4 SSDs with ZFS on top and disabled sync, so every write returns when written into ZFS ARC. sync to discs runs async	17:37
ThiagoCMC	Sounds awesome! I also have NVMe SSDs at my compute nodes, doing thing. I really want to use them as fast ephemeral, and then, Ceph volumes for my lovely data.	17:38
kleini	spatel: I am running about 200 VMs on each compute node, with 128 cores and 1TB memory	17:38
ThiagoCMC	Cool!	17:38
spatel	kleini: 128 cores? (overcommit vCPU?)	17:38
kleini	AMD Epycs 7702P with DDR4-3200 rock everything	17:39
ThiagoCMC	So, after commenting out the line: "nova_libvirt_images_rbd_pool", which playbooks should I run, just "os-nova-install.yml" ?	17:39
kleini	1:4 vcpu overcommit	17:39
ThiagoCMC	Or better go with "setup-everything.yml"?	17:39
kleini	os-nova-install.yml should be enough	17:40
admin0	if everythign is already setup, just nova setup	17:40
ThiagoCMC	ok	17:40
spatel	ThiagoCMC: nova playbook	17:40
kleini	qcow2 images are then stored in /var/lib/nova/instances	17:40
ThiagoCMC	Just like old days	17:40
admin0	i am stuck 16.04 -> 18.04 .. my util wants to be created with 16.04 and fails .. anyone recalls how they were able to fix it ?	17:40
*** jbadiapa has quit IRC		17:41
kleini	the container itself?	17:41
admin0	yep	17:47
admin0	i nuked c1 (controler1) , got it up on ubuntu 18.04 ( c2 and c3 still on ubuntu 16.04) .. removed the facts, reran setup-hosts -l c1 and then lxc-containers-create -l c1_*	17:48
admin0	kleini, this is the error i get: https://gist.github.com/a1git/be5353eb91260945d8b00bcd21df7b68	17:48
admin0	https://gist.githubusercontent.com/a1git/be5353eb91260945d8b00bcd21df7b68/raw/cc330d8f96267ea79a5057e5a50b7984bc72bf46/gistfile1.txt	17:49
admin0	nuked c2 sorry .. not c1	17:49
admin0	is the util container used to do anything ?	17:52
*** yasemind34 has quit IRC		17:54
*** yann-kaelig has joined #openstack-ansible		17:56
admin0	maybe when all the controlles are on 18, this will fix itself	18:01
ThiagoCMC	BTW, just curious about another thing... How mature is the systemd-nspawn deployment?	18:01
kleini	admin0: I read a lot about magic, that needs to be done on repo containers. The error message looks like, something is missing on the repo containers	18:05
kleini	admin0: magic that needs to be done on repo containers when you upgrade xenial->bionic	18:06
*** SecOpsNinja has left #openstack-ansible		18:08
kleini	ThiagoCMC: does not work any more. I think it was removed in U release. I am using systemd-nspawn a lot as it is very nice integrated into systemd of the host but for OSA it needs to be re-implemented. Especially the network connection of containers is just done by a list of network commands in a systemd unit. Instead it should be done today with systemd-networkd and systemd-nspawn	18:09
jrosser	ThiagoCMC: the nspawn deployment is basically deprecated because there is no one to maintain it	18:09
ThiagoCMC	Oh, I see... Ok then =P	18:10
*** tosky has joined #openstack-ansible		18:10
ThiagoCMC	Looking forward to try LXD instead!	18:10
kleini	admin0: do you follow this https://docs.openstack.org/openstack-ansible/rocky/admin/upgrades/distribution-upgrades.html guide? a lot of things in there regarding disabling repo containers and haproxy redirection	18:10
kleini	ThiagoCMC: I use Linux Bridge and LXC in control plane while using OVS and ML2 OVS on compute and network nodes	18:11
ThiagoCMC	Cool, I did that in the past as well. But today I'm 100% on Linux Bridges	18:12
jrosser	admin0: at 16->18 upgrade time you need to make sure that the repo server that the venvs get built on is the one which is running 18.04	18:12
ThiagoCMC	Thing is, my Controllers are QEMU VMs, with OSA's Containers within them. I would like to make my Controllers, LXD (instead of QEMU), and make OSA's Containers nested.	18:13
jrosser	admin0: if it turns out one of the old 16.04 repo servers is being used for venv building you can override the automatic selection with this variable https://github.com/openstack/ansible-role-python_venv_build/blob/master/defaults/main.yml#L121	18:13
jrosser	ThiagoCMC: the 'surface area' which we can support is pretty much related to the number of contributors we have	18:14
jrosser	currently whilst i have POC patches for LXD i have no requirement for that in $dayjob so other things will get my attention ahead of those	18:15
ThiagoCMC	Sure, all OSA contributors are really awesome! I wanna try the LXD patches, can you send me the link again?	18:16
jrosser	and thats pretty much the story with nspawn, no-one who is actively contributing is using it	18:16
ThiagoCMC	I see, makes sense	18:16
jrosser	gerrit is offline for upgrade so you can't see them right now	18:17
ThiagoCMC	Ok	18:17
jrosser	but i did start the new ansible roles https://github.com/jrosser/openstack-ansible-lxd_hosts	18:17
jrosser	https://github.com/jrosser/openstack-ansible-lxd_container_create	18:18
jrosser	they are very rough, but i did get as far as setup_hosts completing	18:18
jrosser	the way the containers are initialised is totally different, and would use cloud-init	18:19
ThiagoCMC	Cool! Yep, I really love LXD	18:20
ThiagoCMC	I can see it improving OSA deployments	18:20
jrosser	and the addition of networks and mounts should be doable either via the lxd api or cli, so there is really quite some work to do to get equivalence with all the options possible right now with the lxc role	18:20
ThiagoCMC	Sure	18:20
jrosser	elsewhere i use lxd profiles for this sort of thing	18:21
ThiagoCMC	Me too	18:21
jrosser	and that would be neat, managing profiles for particular containers	18:21
ThiagoCMC	My Compute Nodes and Ceph OSDs are actually LXD containers (bare-metal though)	18:21
jrosser	but thats a bit of a contradiction with "add mount X to container Y" which would maybe imply a 1:1 profiles:containers setup	18:22
jrosser	so i'm really not at all sure what the best way to duplicate what the lxc stuff does using the features in LXD	18:22
ThiagoCMC	Hmm... Kinda weird	18:22
ThiagoCMC	Might be better to research more and do things the "LXD-way" only... Maybe even using LXD Cluster features somehow	18:24
jrosser	oh and also lxd<>snap makes me very nervous	18:24
ThiagoCMC	LOL	18:24
kleini	have an nice weekend	18:25
ThiagoCMC	You too!	18:25
jrosser	we download specific snaps and use the --dangerous flag to manually install them	18:25
*** andrewbonney has quit IRC		18:25
jrosser	then there is no danger of asynchronous auto-upgrades	18:25
ThiagoCMC	jrosser, maybe one day, LXD without snap: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=768073	18:26
openstack	Debian bug 768073 in wnpp "ITP: lxd -- The Linux Container Daemon" [Wishlist,Open]	18:26
ThiagoCMC	Soon as Debian releases LXD without Snap, I'm moving our from Ubuntu. heheeh	18:27
ThiagoCMC	*moving out	18:28
jrosser	yeah well that will be an interesting discussion at canonical about if they follow the upstream debian approach for LXD	18:29
jrosser	i expect it will be a quick "no"	18:29
ThiagoCMC	Yep	18:29
ThiagoCMC	But Snap is... Creepy.	18:29
*** gouthamr_ has quit IRC		18:30
jrosser	right - we had an auto-update install the same LXD bug pretty much simultaneously on a bunch of stuff running H/A	18:30
jrosser	which broke it all	18:30
ThiagoCMC	LOL	18:30
ThiagoCMC	damn	18:30
jrosser	indeed, thats when we switched to manual snap downloads and --dangerous to install them offline	18:31
jrosser	snapd doesnt know where they are from so can't update them	18:31
ThiagoCMC	Right!? You can turn off automatic downloads even on an iPhone...	18:31
*** gouthamr_ has joined #openstack-ansible		18:46
*** jamesdenton has quit IRC		18:57
*** jamesden_ has joined #openstack-ansible		18:57
*** tosky has quit IRC		19:13
*** luksky has quit IRC		19:50
*** luksky has joined #openstack-ansible		19:50
*** yann-kaelig has quit IRC		19:55
spatel	folks, what nova filters you guys using in your cloud selective your put each available filter in nova_scheduler_default_filters: ?	20:12
spatel	any performance issue if i fill that list with all of the filters?	20:12
*** klamath_atx has joined #openstack-ansible		20:22
admin0	its evaluated when a new instance is created	20:27
admin0	and is usually very fast	20:27
admin0	unless you spawn like 100+ instances simultaneously, should not be that much of an issue	20:28
admin0	but use only what you need	20:28
spatel	admin0: thanks	20:30
admin0	for example, unless you exactly know what you need, you don't need NUMATopologyFilter for example	20:31
admin0	ok all .. have a great weekend	20:33
spatel	thanks admin0	20:39
spatel	i do use NUMA so i need that filter but i get it what you saying	20:39
spatel	jrosser: do we have any documentation about ansible tags? I meant list of tags we can use?	20:42
spatel	or just dig into roles and find out ?	20:42
ThiagoCMC	admin0, enjoy your weekend too buddy!	20:46
*** klamath_atx has quit IRC		20:48
spatel	where you folks located ?	20:51
ThiagoCMC	I'm in Canada, about 100km from Toronto	20:54
spatel	nice!	20:56
spatel	I am getting very strange issue. i am trying to add huge page compute node and define that in flavor but when i launch instance getting error No valid host found	20:58
spatel	do we need to do anything else?	20:58
*** klamath_atx has joined #openstack-ansible		21:08
*** klamath_atx has quit IRC		21:14
*** alvinstarr has joined #openstack-ansible		21:15
*** klamath_atx has joined #openstack-ansible		21:16
*** klamath_atx has quit IRC		21:20
*** klamath_atx has joined #openstack-ansible		21:21
jamesden_	spatel anything interesting in nova conductor log?	21:25
spatel	jrosser: nova conductor just saying No available host	21:26
spatel	jamesden_: ^^	21:26
spatel	I found issue	21:26
spatel	my flavor had memory size not power of 2	21:26
spatel	8193 instead of 8192	21:26
spatel	look like huge page need proper size but first time i have seen this kind of issue	21:27
*** klamath_atx has quit IRC		21:27
*** jamesden_ is now known as jamesdenton		21:29
jamesdenton	nice find	21:29
spatel	jamesdenton: one of Redhat KB article saying - Nova requires that the memory indicated on a HugePages-enabled flavor be a direct multiple of the actual HugePage size.	21:32
jamesdenton	well, i think that makes sense. if your page size is 1G then to be some multiple of 1024. Same for 4k or 2M. never an odd number, i guess	21:34
spatel	this is not documented anywhere... hmm	21:41
*** klamath_atx has joined #openstack-ansible		21:42
ThiagoCMC	kleini, thank you so much for helping me to create Instances at the local storage while having the option to attach Ceph Volumes into them!! THANK YOU!!! THANK YOU!!!	21:44
ThiagoCMC	I'm getting there... lol	21:47
ThiagoCMC	I can't wait to update my cloud to Victoria! :-D	21:51
spatel	ThiagoCMC: what was the solution ?	21:51
*** klamath_atx has quit IRC		21:51
ThiagoCMC	spatel, comment the following line: https://opendev.org/openstack/openstack-ansible/src/branch/master/etc/openstack_deploy/user_variables.yml.prod-ceph.example#L28	21:52
spatel	oh that is what i told you to remove that from nova.conf but yes you can do via user_variable also	21:52
ThiagoCMC	Noa my Insntaces runs on NVMe local PCI storage (no RAID1 though), while I can attach Ceph volumes to them, pretty much like Amazon EC2.	21:52
ThiagoCMC	:-P	21:53
ThiagoCMC	I'm so happy! LOL	21:53
spatel	In my case i have created compute node file so per node i turn that on/off	21:53
ThiagoCMC	Hmm... I see, like host aggregates?	21:53
*** klamath_atx has joined #openstack-ansible		21:54
spatel	in /etc/openstack_deploy/host_var/compute-1.yml	21:54
spatel	in /etc/openstack_deploy/host_var/compute-2.yml	21:54
ThiagoCMC	Nice	21:54
ThiagoCMC	Got it	21:54
spatel	in compute-1.yml i have added nova_libvirt_images_rbd_pool: vms	21:54
ThiagoCMC	that's neat	21:54
spatel	and created aggregates filter so if someone say i need shared storage for live migration then their VM will endup on ceph disk	21:55
spatel	basic grouping...	21:55
spatel	user_variable.yml is global file so i trying to keep most of stuff in node specific file.	21:56
spatel	also you can create groups in inventory and create single yml file like all_ceph_compute.yml and put nova_libvirt_images_rbd_pool: vms option in it..	21:57
ThiagoCMC	That's really cool, I'll do the same here =)	21:57
*** rh-jlabarre has quit IRC		22:12
*** spatel has quit IRC		22:18
*** nurdie has quit IRC		22:37
*** nurdie has joined #openstack-ansible		22:37
*** nurdie has quit IRC		22:42
*** luksky has quit IRC		22:50
*** luksky has joined #openstack-ansible		23:03
*** klamath_atx has quit IRC		23:34
*** klamath_atx has joined #openstack-ansible		23:35
*** nurdie has joined #openstack-ansible		23:43
*** nurdie has quit IRC		23:48

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!