Friday, 2023-02-24

cloudnull	noonedeadpunk sorry, I'm replying late. I out of the loop, but the strategy plugin may not actually be needed with modern ansible. Much of the strategy plugin was trying to do forward lookups on tasks and omit them when conditions were not met all without ever loading the task into the ansible-engine. We also were changing the connection plugin	01:43
cloudnull	when dealing with specific scenarios to use paraminko or ssh. All of this is probably not really needed any more.	01:43
cloudnull	I think the only thing that would actually need to be kept is the magic variable mapping	01:44
cloudnull	https://github.com/openstack/openstack-ansible-plugins/blob/master/plugins/strategy/linear.py#L48	01:44
cloudnull	but then again, I'm not really sure.	01:44
cloudnull	ThiagoCMC I've seen that error before, the issue for me was that the Galera cluster health check with xinetd ( I think OSA uses something else now ) wasn't permitting the traffic from the HAProxy node. To fix it I had to change the allow address for the head check config; in general I would set it to something like 0.0.0.0/0 for testing then lock	01:50
cloudnull	it down once it was working again.	01:50
cloudnull	https://github.com/openstack/openstack-ansible-galera_server/blob/master/defaults/main.yml#L82	01:51
cloudnull	here's the implementation - https://github.com/openstack/openstack-ansible-galera_server/blob/8a8d29ea490fba6695e3356831846466f6991089/tasks/galera_server_post_install.yml#L60	01:52
cloudnull	I guess that's using clustercheck these days.	01:53
cloudnull	but same basic idea	01:53
prometheanfire	cloudnull: ohai	02:29
noonedeadpunk	cloudnull: aha, thanks for explaining it a bit. SSH connection plugin is still needed though as we don't have ssh inside lxc and it's used to do lxc-attach from lxc_hosts	08:12
noonedeadpunk	and indeed looks like ansible-core 2.13 doesn't really need our strategy, but I defenitely see performance improvenent with 2.11	08:12
jamesdenton	prometheanfire makes complete sense and demonstrates my lack of testing with LXC, only metal	13:44
jamesdenton	So we may need to adjust that strategy, then	13:44
Mohaa	"Host evacuation [for Masakri] requires shared storage and some method of fencing nodes, likely provided by Pacemaker/STONITH and access to the OOB management network. Given these requirements and an incomplete implementation within OpenStack-Ansible at this time, I’ll skip this demonstration." (jamesdenton's blog post, 20201)	14:12
jamesdenton	hi	14:13
Mohaa	(:	14:13
Mohaa	Hi	14:13
jamesdenton	i haven't looked at it again since then	14:14
Mohaa	Is that feature not yet implemented in openstack-ansible?	14:14
Mohaa	Ah, ok	14:15
jamesdenton	Well, TBH I am not sure about any sort of pacemaker role in OSA, and i'm pretty sure we don't have any sort of reference arch for this	14:17
Mohaa	Oops! Instances HA is necessary for a production environment!	14:26
jamesdenton	depends on who you ask. pets vs cattle, etc.	14:30
prometheanfire	jamesdenton: I'm willing to test patches for it :D	15:32
spatel	what SSDs you guys prefer for Ceph nowadays..	15:48
spatel	Intel or Samsung and what model?	15:48
damiandabrowski	can't help you with vendor but recently I learned that enterprise grade SSDs may behave a lot better for ceph due to the way they handle fsyncs: https://forum.proxmox.com/threads/ceph-shared-ssd-for-db-wal.80165/post-397544	15:56
damiandabrowski	so even my samsungs' 980 pro performance looks good on paper, they probably suck for ceph. I haven't got a time for a real comparison though	15:57
admin1	spatel.. intel .. it generally has very high endurance compared to samsungs	15:58
spatel	Model?	16:00
admin1	p* series	16:00
spatel	damiandabrowski Samsung 980 pro is consumer SSD correct?	16:00
admin1	dont recall the exact model	16:00
damiandabrowski	spatel: yeah	16:01
spatel	They sucks..! ask me.. i had to upgrade my whole cluster with PM883 4 year ago..	16:01
spatel	consume SSD sucks..	16:01
spatel	admin1 thanks i will try to find p* model and see which one fit in pocket :)	16:02
admin1	spatel, S4500 also has a lot of endurance	16:02
spatel	Assuming we don't need dedicated for journal SSD with Intel SSD	16:02
admin1	if you are going to have bigger cluster or chance of growth, then having a nvme journal might become bottleneck	16:03
spatel	https://www.amazon.com/Intel-SSDSC2KB019T701-S4500-Reseller-Single/dp/B074QSB52M/ref=sr_1_3?crid=1B4LWHP6M79LY&keywords=Intel+S4500+2TB&qid=1677254585&s=electronics&sprefix=intel+s4500+2tb%2Celectronics%2C79&sr=1-3&ufe=app_do%3Aamzn1.fos.304cacc1-b508-45fb-a37f-a2c47c48c32f	16:03
damiandabrowski	admin1: ouh, so you generally do not recommend having NVMe WAL&DB for SSD OSDs?	16:04
admin1	i do not if you are going to have 100s of ssds or say public cloud where you need to add dozens per month for growth for example	16:04
spatel	Why do we need dedicated journal for SSD? why create single point of failure	16:05
spatel	i can understand with HDD	16:05
admin1	this * endurance * comes into play	16:05
damiandabrowski	i guess performance. Ceph docs suggest putting WAL on faster device type and NVMe performance is generally better than standard SSD	16:06
damiandabrowski	but i'd love to see some benchmarks comparing colocated vs. external WAL for SSD OSDs	16:06
damiandabrowski	and also what is the optimal number of SSD OSDs per NVMe WAL(I have 4 OSDs per NVMe WAL but not sure if it's the best ratio)	16:08
spatel	damiandabrowski you do use NvME for SSD journal?	16:09
damiandabrowski	yeah	16:09
spatel	How many SSD you have per server ?	16:10
spatel	i meant OSD	16:10
damiandabrowski	it's a small cluster, I have 3 storage servers, each one has: 6 HDD OSDs + 1 SSD journal and 4 SSD OSDs + 1 NVMe journal	16:13
damiandabrowski	but honestly, I don't know if it's a good ratio or not :D cluster performance is quite bad, but probably due to consumer grade SSDs	16:13
spatel	consumer grade SSD is really really bad for Ceph (I went through that pain)	16:17
spatel	Do you have dedicated MON nodes and running with OSD? because of budget i am using MON+OSD on same node	16:18
spatel	Bad bad idea.. but plan is to when we have money will split it	16:18
damiandabrowski	I'm running MONs on openstack controllers	16:18
damiandabrowski	(in LXC containers)	16:20
spatel	ohh	16:21
damiandabrowski	and yeah..probably I'll have to replace my samsungs sooner or later :\|	16:21
spatel	I am deploying with cephadm	16:21
spatel	I did replace with PM883 SSD and happy with performance.	16:21
damiandabrowski	well, probably you can create a ceph-mon group in OSA, let it prepare containers and do the rest with cephadm	16:23
spatel	cephadm use docker/podman	16:23
damiandabrowski	ahhh that's true, forgot about it :\|	16:23
spatel	haha!!	16:23
spatel	what a mess..	16:23
spatel	what number we should look in this output of smartctl - https://paste.opendev.org/show/bB5XMOXki9y1Tt0igBdV/	16:24
damiandabrowski	regarding containers for ceph: i love that title :D https://www.youtube.com/watch?v=pPZsN_urpqw	16:25
damiandabrowski	regarding smartctl, i think most attributes are meaningful but instead of manually reading smartctl output, I prefer to use smartd and just wait for emails :D	16:29
prometheanfire	jamesdenton: as far as metal vs lxc, is metal becoming the more 'supported' option by OSA?	16:29
damiandabrowski	https://github.com/stuvusIT/smartd - i used this role some time ago	16:30
mgariepy	damiandabrowski, yeah i saw this one last year :D they almost kill ceph-ansible because of that tho.. :P	16:41
damiandabrowski	mgariepy: oops, but at least with ceph-ansible you can choose whether you want a containerized deployment or not :D https://github.com/ceph/ceph-ansible/blob/main/group_vars/all.yml.sample#L583	16:44
damiandabrowski	can't say the same about cephadm :\|	16:44
spatel	damiandabrowski i will look into smartd	16:47
spatel	But how do i know how much life left for my SSD here https://paste.opendev.org/show/bB5XMOXki9y1Tt0igBdV/	16:48
jamesdenton	prometheanfire all of our deploys have been "metal" since Stein	16:50
spatel	damiandabrowski what RAID controller card do you prefer for ceph ?	16:55
damiandabrowski	spatel: i think you can consider this disk as worn out, but if you want to save some money, it should be ok to keep it until you see some errors(like Uncorrectable_Error_Cnt )	17:01
spatel	+1	17:02
damiandabrowski	regarding RAID controller, I just use some random HBA. AFAIK ceph do not recommend raid controllers	17:02
prometheanfire	noonedeadpunk: is metal considered more supported now than lxc, where dev effort / support is likely to be going? asking for a new install	17:02
spatel	What would be that random HBA ?	17:05
spatel	Any model i should follow..	17:05
spatel	is it possible to use RAID + JBOD on same controller?	17:05
spatel	Like OS disk do RAID and rest of disk JBOD	17:05
damiandabrowski	don't remember the exact model, something from broadcom	17:15
damiandabrowski	i think it depends on the controller, few years ago i was working with one that didn't support JBOD at all	17:15
damiandabrowski	and I had to configure RAID0 separately on each disk xD	17:16
damiandabrowski	or maybe i had to create RAID1 on each disk because RAID0 was not supported? don't remember, it was a long time ago	17:20
spatel	That is what i have in old ceph all RAID0	17:30
admin1	yeah .	17:30
spatel	I never work on JBOD so no idea what model and how good they are	17:30
spatel	Do you enable RAID0 with Write-Back-Cache on HBA controller?	17:31
admin1	spatel, does the controller have its own battery unit	17:35
spatel	admin1 yes controller has battery	18:09
spatel	small one :)	18:09
noonedeadpunk	prometheanfire: I don't think it matters much to be frank. But metal with ceph will be troublesome if both glance and cinder will use it as a backend. So you will need to isolate some of them. And troublesome not during deploy but futher operations	18:18
noonedeadpunk	prometheanfire: me personally still running LXC and not going to migrate out of it	18:18
prometheanfire	good to know about ceph	18:18
noonedeadpunk	(as of today)	18:18
noonedeadpunk	so it's more about your preferrence to be frank	18:19
prometheanfire	ya, I like the containers, being able to blow things away is nice	18:19
noonedeadpunk	Well, if you have ironic you can blow off controllers as well... But yeah, limiting impact might be good sometimes	18:19
prometheanfire	ironic for the controllers? snake eating it's own tail :P	18:36
spatel	prometheanfire haha!! controller running ironic and ironic building controller.. to much fun	19:17
noonedeadpunk	it can be standalone like bifrost for instance	20:04
prometheanfire	true	20:18
admin1	anyone tried trove ?	20:23

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!