Thursday, 2023-06-15

derekokeeffe85	Morning noonedeadpunk (if you're on that is)	08:16
opendevreview	Jonathan Rosser proposed openstack/openstack-ansible-os_tempest master: Update to cirros 0.6.2 https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/886165	09:08
ncuxo	hmm so the playbook is installing all the services in lxc containers?	10:07
ncuxo	https://docs.openstack.org/project-deploy-guide/openstack-ansible/latest/run-playbooks.html the first playbook setup-hosts.yml is preparing the target hosts. If that is true then why I need to prepare the host beforehand? All you should need before hand is just put your ssh keys inside the target host and thats it. Or I'm missing a point here?	10:14
jrosser	ncuxo: there are some pre-requisites, like networking that you must do yourself on the target hosts	11:56
jrosser	the setup-hosts playbook is specific things required on all hosts for the openstack deployment	11:57
jrosser	"All you should need before hand is just put your ssh keys inside the target host and thats it" - yes you are missing the point because to some extent every deployment is different at a physical/network level at least, number of interfaces, approach to H/A, storage local/NFS/infiniband/whatever	11:59
jrosser	openstack-ansible allows a very large degree of operator freedom to architect the deployment to meet their own requirements, so it is really not a "shrink wrap installer"	12:00
ncuxo	wait what all I've seen is install some packages, ssh and do the network bridges dependant on the node	13:02
ncuxo	oh yeah the storage, well the storage can also be compartmentalised in a playbook	13:03
ncuxo	there is even an already written playbooks for that linux_system_roles/storage	13:03
opendevreview	Simon Hensel proposed openstack/openstack-ansible-galera_server master: Add optional compression to mariabackup https://review.opendev.org/c/openstack/openstack-ansible-galera_server/+/886180	14:21
NeilHanlon	ncuxo: as jrosser said, the point of the project is not to do everything for the users, but provide a high degree of freedom for it to be customized to your environment. We cannot (and won't) make decisions about how your network, storage, etc, is configured	18:22
NeilHanlon	Doing so would limit the amount of flexibility operators have to use OSA how they like	18:22
ncuxo	okay so I have 5 servers 3 will be in the initial install, I want to have everything on those 3 servers and scale out with whatever services I need. I'm trying to build an HCI deployment where openstack is self sufficient and doesn't need anything except external router. How should my storage be configured then if openstack doesn't manage my storage	18:53
NeilHanlon	lowercas_: https://review.opendev.org/c/openstack/openstack-ansible/+/869762/8	18:58
*** lowercas_ is now known as lowercase		18:59
opendevreview	Neil Hanlon proposed openstack/openstack-ansible stable/yoga: Drop `else` condition in the container_skel_load loop https://review.opendev.org/c/openstack/openstack-ansible/+/886143	19:02
opendevreview	Neil Hanlon proposed openstack/openstack-ansible stable/yoga: Drop `else` condition in the container_skel_load loop https://review.opendev.org/c/openstack/openstack-ansible/+/886143	19:19
opendevreview	Neil Hanlon proposed openstack/openstack-ansible stable/yoga: Add is_nest property for container_skel https://review.opendev.org/c/openstack/openstack-ansible/+/886206	19:19
jrosser	ncuxo_: openstack has a few different types of storage (volumes / object / images / ephemeral) and supports many backend implementations for those, for example for block storage you can choose from these https://docs.openstack.org/cinder/latest/configuration/block-storage/volume-drivers.html	19:51
jrosser	so to answer "how should my storage be configured" you need to choose which of the storage types you want to implement and which backend you are going to use for them	19:52
jrosser	as an example, it is pretty common to use ceph to provide volume, image and object storage	19:52
ncuxo_	I wanna be able to implement all of them block file and object storage	19:53
ncuxo_	so I have to install ceph outside of openstack?	19:53
jrosser	right - so it is your choice of backend driver	19:53
jrosser	openstack-ansible can deploy ceph because it has an integration with ceph-ansible	19:54
jrosser	though, for various reasons it is a popular choice not to have tight coupling between the ceph deployment and the openstack deployment	19:54
jrosser	obviously that is hard to do with an HCI approch	19:54
jrosser	but HCI does bring it's own challenges	19:55
ncuxo_	exactly because I want to use all the resources each server has	19:55
ncuxo_	could you point a few please?	19:56
jrosser	you would need to have a plan for dealing with resource contention between ceph OSD and your virtual machines, and the control plane processes	19:56
jrosser	how will you prioritise which process should be killed by the OOM killer when ceph memory usage balloons during a large recovery event?	19:56
jrosser	your vm libvirt? mariadb database for openstack?	19:57
ncuxo_	if openstack installs ceph shouldn't it take care of that ?	19:57
jrosser	openstack is the projects that implement the APIs like nova / cinder	19:58
jrosser	openstack does not install ceph, openstack-ansible does	19:58
ncuxo_	ok then doesn't openstack ansible installs a systemd unit that is checking for stuff like that	19:59
jrosser	we have a reference implementation which is not HCI	19:59
jrosser	and we would generally from an opestack-ansible perspective not recommend an HCI approach, though nothing stops configuring a deployment like that	19:59
ncuxo_	https://docs.openstack.org/openstack-ansible/latest/user/ceph/full-deploy.html	20:00
jrosser	yes	20:00
jrosser	the compute hosts are separate from the controllers, and separate from the OSD hosts	20:00
ncuxo_	and since I'm planning to have it all in one I'm begging for trouble ....	20:01
ncuxo_	got it now	20:01
mgariepy	what are you guys uses for networking ?	20:02
jrosser	so you define in nova for example, how much memory to keep spare for "other things"	20:02
jrosser	and you would need to come up with a figure that was sufficient for ceph + 1/3rd of the control plane	20:02
ncuxo_	jrosser: only that ? then thats not that hard I can spare an easy 128g ram for all those vm	20:03
ncuxo_	s/vm/vns	20:03
ncuxo_	,,, cant type today	20:03
jrosser	well like i say ceph memory usage can be wildly unpredictable	20:04
jrosser	steady state is very different from when it's recovering from a major "event" in the cluster, like loss of a node or somethnig	20:04
ncuxo_	all my storage is 2g ssds and I'm not planning on making it larger, I prefer to scale out then add thicker drives	20:04
ncuxo_	2T ssds ...	20:05
jrosser	mgariepy: do you every try anything converged like this?	20:05
mgariepy	i would not.	20:05
mgariepy	when 1 service needs debugging it's enough for me i don't need all of them to be down at the same time	20:05
ncuxo_	mgariepy: but I have 3 hosts so they all should be replicated and in ha state?	20:06
jrosser	normally those would be 3x control plane hosts then you add more as computes	20:06
ncuxo_	I don't care why something fails just rinse and repeat	20:06
mgariepy	well. when this works sure.	20:06
jrosser	controllers can be smaller resource-wise than compute hosts	20:07
mgariepy	maybe openstack isn't the right solution ?	20:07
ncuxo_	and I have only beefy servers this is why I need everything to work on the control plane as well	20:08
ncuxo_	mgariepy: I'm trying to move away from the typical hypervisor infra. I've been checking baremetal k8s and baremetal openstack	20:09
mgariepy	for only a couple server like that i would probably try proxmox	20:10
ncuxo_	I'm doing 3 as a start then I'll add 2 more and have another 20 waiting for the load	20:10
jrosser	it feels wrong for that quantity of hardware not to have dedicated controllers	20:11
mgariepy	maybe try to have 1 controller and a couple compute ? for the storage i'm not sure.	20:12
jrosser	it really depends on the use case	20:13
jrosser	you would build a cluster dedicated to CI jobs with no shared storage at all	20:13
jrosser	but if uptime/availability were important then you would make different choices	20:13
jrosser	there is not one correct way to build openstack, the point is you architect something that fits the use case	20:13
mgariepy	i tend to build cluster dedicated to users without local storage instead :D but yeah depend on the use-case.	20:14
ncuxo_	jrosser: it really doesn't make sense to waste 48 cores and 768 per server just for the control plane	20:14
jrosser	then personally i would also have some smaller hardware	20:14
mgariepy	you can have a couple of 12 core 128gb nodes for the controllers..	20:14
jrosser	my test lab has 3x 4 core / 64g controllers for example	20:15
jrosser	super cheap	20:15
ncuxo_	https://docs.openstack.org/openstack-ansible/latest/user/ceph/full-deploy.html ok and then still I need seperate hosts for the ceph and compute	20:15
jrosser	thats what the reference architecture says	20:16
ncuxo_	also what about the LBs I want them also in openstack	20:16
jrosser	nothing stops you co-locating ceph & compute, opestack-ansible will deploy that if its what you want	20:17
mgariepy	how many drive do you have per server?	20:17
jrosser	but then remember it lets you have pretty much any architecture you want	20:17
ncuxo_	jrosser: hmm if I have the compute and ceph on the same server I can simply add quotas on all my compute to fill up to 80% and this way even if ceph hogs the memory during a recovery the vms can migrate to the other hosts	20:18
jrosser	like i say you can tell nova through it's config how much host memory should be reserved	20:19
ncuxo_	mgariepy: 10 drives per server 2t sas ssds	20:19
jrosser	sas....	20:19
jrosser	no raid controller i hope	20:20
mgariepy	you can also pins cores to vms.	20:20
mgariepy	it's flexible :D	20:20
ncuxo_	the raid controller is in jbod	20:20
jrosser	maybe reserve 50G, dont know i'm just guessing	20:21
ncuxo_	probably will leave it at 68 and use the 700	20:21
jrosser	i am generally more concerned about "day 2 operations" when thinking about this stuff	20:21
jrosser	like how do i upgrade my openstack version, what happens when i need to upate the OS major release across the whole cluster	20:22
ncuxo_	well I have 6 months for planning and testing	20:22
jrosser	what happens when the OS I have does not support the release of ceph that i need	20:22
mgariepy	cephadm.. only needs podman.. lo	20:23
mgariepy	lol	20:23
jrosser	^ all this is really what becomes your tasks, not worrying about if you fully utilised some server with HCI or not	20:23
ncuxo_	I'm confused again ... why should I care about those stuff if server is broken re-provision it and continue with my day ? why I feel I'm missing something here	20:24
mgariepy	it's not micro-service deployed in k8s. and auto-respawn when one goes offline.	20:24
ncuxo_	isn't that the point of self healing infra everything is ephemeral	20:25
mgariepy	you are talking of openstack.	20:25
ncuxo_	sure isn't ironic responsible to reprovision your host ?	20:25
jrosser	not at all	20:25
mgariepy	nop	20:25
jrosser	ironic is a service you can deploy, which will manage baremetal host deployment for your users, as a service	20:26
ncuxo_	I feel I've been reading then and not understood a thing ...	20:26
ncuxo_	oh so its not meant for the operator its meant for the user ...	20:26
jrosser	some tools (the now-deprecated tripleo for example) did used to use ironic to deploy openstack itself	20:26
jrosser	but that is really not the core purpose of ironic	20:27
jrosser	it can and in some cases is used by the operator too, but thats kind of pretty advanced usage	20:28
ncuxo_	I'm really starting to thing about baremetal k8s with ceph kubevirt and ironic	20:28
jrosser	right - so it entirely depends on your use case what is suitable	20:29
jrosser	if you want multi-tenancy properly for example, that might be a factor	20:29
ncuxo_	my idea was to get a vm on my laptop with openstack ansible as deployment host then provision single host install what is necessary and from this one server expand everything out. This was what I'm looking at	20:31
ncuxo_	this server has all the services inside, provisions the next server if the server count is less than 3 moves the infra services and the core services until I reach total of 3 then just ceph and nova	20:33
jrosser	openstack-ansible is not self-replicating like that	20:34
mgariepy	there are quite a lot of static stuff in osa.	20:36
ncuxo_	when you mentioned 12 cpu 128 ram per control plane host you mean 12cpu not vcpu ?	20:36
ncuxo_	mgariepy: you said earlier that all ceph requires is podman, but in the docu I've seen only docker and lxc containers. So podman is used just for ceph?	20:39
mgariepy	cephadm deploy ceph in podman/docker	20:39
ncuxo_	I prefer if podman was hardcoded and not docker but well ....	20:39
mgariepy	i got to run now. familly time now.	20:40
ncuxo_	thanks for explaining stuff to me	20:40
jrosser	if openstack-ansible deploys ceph it does not use podman or cephadm	20:41
ncuxo_	jrosser: can openstack-ansible manage my LBs or I have to have them separate	20:41
jrosser	it uses LXC (or not if you dont want) and distro packages	20:41
jrosser	but most people, when at decent scale choose to decouple ceph from openstack	20:41
jrosser	ncuxo_: which LB? for your openstack API endpoint, or LBAAS via the octavia service	20:42
ncuxo_	25 servers is not so big at least in my understanding, after listening to some podcast big infra is over 500 servers	20:42
ncuxo_	https://docs.openstack.org/openstack-ansible/latest/user/ceph/full-deploy.html not sure which one is this one	20:43
jrosser	that is the LB for the dashboard and API endpoints	20:44
jrosser	openstack-ansible deploys haproxy and keepalived for that by default	20:44
ncuxo_	so I don't need something external ... sweet	20:45
jrosser	again you can choose :)	20:45
jrosser	some poeple like F5 type appliance	20:45
ncuxo_	as I've said outside of firewall I want all the services to come from openstack, dhcp dns lb ntp	20:46
jrosser	i think this is also maybe not right	20:47
jrosser	you need to provide NTP yourself, for example	20:47
ncuxo_	can't I have vms which live on openstack and provide the service ?	20:49
jrosser	tbh i think it is worth stepping back and looking at what it takes to provide infrastructure as a service	20:53
jrosser	your openstack hosts cannot, for example, validate SSL certificates unless they have accurate time	20:54
jrosser	and unsynchronised host clocks is disastrous for ceph	20:54
jrosser	so this tells you that as the platform operator, you must have proper sources of fundamentals like NTP as foundations to build your infrastructure on top of	20:55
ncuxo_	jrosser: I can't find an article describing the prerequisite services before deploying openstack	21:16
NeilHanlon	seems I've missed a lively conversation about HCI	22:26
* NeilHanlon is relieved he missed it		22:27
ncuxo_	:D we can continue if you were not relieved	22:27
NeilHanlon	:P	22:28

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!