Friday, 2022-08-05

*** mhen_ is now known as mhen01:17
*** ykulkarn is now known as yadnesh06:45
*** yadnesh is now known as ykulkarn08:51
*** rlandy|out is now known as rlandy10:27
mhenHello, I've got a question about usage of token authentication in the openstackclient via Keystone.13:44
mhenIf I do "openstack --os-auth-type=token --os-token=... --os-auth-url https://keystone:5000/v3 --os-identity-api-version=3 image list" while passing a token previously acquired via "openstack token issue", I get "The service catalog is empty.".13:44
mhenI did unset all OS_* shell variables beforehand.13:45
mhenAny idea what I might be missing? Generic password authentication using an openrc file works fine.13:45
jeevaAnyone have any idea?14:02
jeeva'Insufficient free space for share creation on host' /var/log/kolla/manila"14:02
jeevawhen I do "manila extend 'name' newsize"14:03
jeevaand I have 2.6 PB free14:03
jeevaInsufficient free space for share creation on host B-03-37-openstack-ctl@cephfsnative1#cephfs (requested / avail): 2560000/249246614:03
jeevatrying to extend it to 260000014:04
lowercasejeeva: is this a case where where someone sells a 1TB drive but you get home and its only 960gb cause of the 2^ math?14:09
lowercasewhat i am saying is it might be 2.6P you are typing in, but that might be getting translated to bytes and you might need to go over some.14:10
jeevai dont know where it gets the idea that it is low on diskspace14:11
jeevacurrently it is "2560000"14:11
lowercasewhat metric is this number in? 249246614:12
lowercaseis that 2MB14:12
jeeva2.49 PB14:12
lowercase2gigabytes, 2 terrabytes14:12
lowercasejeeva: i know you think it is, but i'm asking for you to prove it14:12
jeevai dont know where it get that value from though14:13
lowercaseone sec14:13
jeevaif you take the set value " 2560000" that is 2.56 PB14:13
jeevawhich the share is currently14:14
lowercaseand if you do a ceph df, does that number match up?14:14
lowercaseswitching vpns, im gonna drop14:14
jeeva2.5P  2.4P   94T  97% /share14:15
jeeva2.5P  2.4P   94T  97% /share14:16
jeevaso 2560000 = 2.5P14:16
lowercasedo that on the ceph cluster, do you have more capacity to give? i.e. does the ceph pool have a limit placed on it14:17
jeevahaven't actually checked that ... mmmm14:18
jeevabut i have 1.3 PB free, but let me check pool values14:18
jeevamax bytes  : N/A14:19
lowercasei like to use the command, ceph osd pool autoscale-status14:19
jeevaso i guess no quota set on cephfs14:19
lowercaseand look at TARGET-SIZE14:19
jeevano TARGET-SIZE value next to any pools14:19
lowercasethat's good.14:20
lowercaseone sec, im looking at my own manilla config14:21
jeevamaybe i should have a check at setfattr as well14:21
jeevamaybe a limit on setfattr -n ceph.quota.max_bytes14:21
lowercasecheck `manila absolute-limits`14:24
jeeva2748779069440000 is 2.74 PB, but 2.44PiB14:24
jeevalowercase, looks pretty vanilla14:25
jeevamaxTotalShareGigabytes  = 1000 Gb14:25 is blocked by my work14:25
jeeva1000 gb is 0.1 PB though, and im far pass that14:27
lowercasejeeva: ceph.quota.max_bytes="2748779069440000"14:28
lowercasemanila extend is in gigabytes14:28
lowercase260000 gigabytes is ....14:28
jeevayeah, but ceph output is bytes14:28
lowercase2.791729 petabytes14:28
jeevayeah but my manila share is in gigabytes "2560000"14:29
jeevaand trying to exten to 2644000-ish14:30
jeevawhich is below 2.79 Pb14:30
lowercasecurrently, but you are trying to extend it to 260000 gigabytes, plug that in a calculator is 2791728742400000 bytes which exceeds ceph.quota.max_bytes="2748779069440000"14:30
jeevathanks, so my suspicioun in this calculator is correct14:32
jeevalowercase, thanks for your time & input14:32
jeevathis is my scratch storage, that is suppose to be ephemeral, but no, "don't delete files on scratch"14:33
jeevalowercase, one last thing14:34
jeevado you think i can increase the ceph quota, and it will not break manila ?14:34
jeevasince doesn't the manila command update the ceph quota ?14:34
lowercaseHonestly, my largest cluster is 2.4PB so i haven't needed to do this.14:34
jeevawhat is your largest single file ?14:35
lowercaseI am strictly prohibited at looking at the data on my clusters, so I can honestly say that i don't know.14:35
jeevaseriously ?14:36
jeevaso you cant even check a single item ?14:37
jeevatop secret kek ?14:37
lowercaseI do have a MDS cache problem related to cephfs that i haven't dove into. Do you have a way that i can look at sizes of data but not the data itself?14:37
jeevawhat is your mds_cache_memory_limit14:39
lowercaselet me get into that cluster, one sec14:40
jeevado you have slow / trim MDS issue ?14:40
lowercase1 clients failing to respond to cache pressure14:42
jeevathat is a common thing for us in HPC14:43
lowercaseMDS_CLIENT_RECALL: 1 clients failing to respond to cache pressure14:43
lowercase    mds.alplpmultceph03(mds.0): **** failing to respond to cache pressure client_id:14:43
jeevawe normally just let the job continue to run, and then set the client (compute node) to reboot when the job is done14:43
lowercaseah shit, i censored the other stuff14:43
lowercasenah, this is constint14:44
jeevayeah so you need to increase your mds_cache_memory_limit14:44
jeevai.e. have more memory on the controller14:44
jeevawhich is one of the issues14:44
jeevaalternatively check this doc that i sometimes reference14:44
jeevai actually had that issue last week again, but it was a job that was running that wasn't properly coded14:45
jeevaso it messed around with the storage14:45
jeeva25 Gb/s14:45
jeevai was playing around with my MDS cache preassure issues last week again, had another doc , just have to find it now14:46
jeevabtw my mds cache memory limit is currently on 100 GB14:48
lowercaseoh wow, i just bumped it from the default of 4Gi to 12Gi.14:48
lowercasethe warning went away but time will tell now14:48
jeevaoh wow, 4 GB that is way under specced14:49
jeevafor like 3 basic osd nodes14:49
lowercasethis is a 18 node cluster, with 2.4PB of space, all the drives are behind a raid controller at raid 5.14:50
lowercaseso, 65 osds14:50
lowercase756gigs of memory each node;.14:50
jeevarunning 36 node cluster, 24 x 16 TB , 48 core x 256 gb ram each14:52
jeevano RAID14:52
jeeva~12 Pib Raw14:52
jeevaeach node has NVMe for rockswal/db partition, and 2x 500 GB SSD for cephfs_metadata14:52
lowercaseyou got nvme14:53
lowercasehow is the performance on those14:53
lowercasewe are just about to buy a few racks of them.14:53
jeevaread somewhere people say the difference is minimum but for me it was a massive change14:53
jeevaPOC cluster was exactly the same, but the NVMe's wasnt configured14:53
jeevagot like 45% throughput of the existing one with RockswalDB14:54
jeevaon the bluestore (collocated)14:54
lowercaseI'm not familiar with RockswalDB14:55
jeevaag that is what i call it14:56
jeevalet me get the correct name14:56
jeevayou configure it in your ceph inventory file14:57
jeevaB-02-40-cephosd.maas osd_objectstore=bluestore devices="[ '/dev/sda', '/dev/sdb', '/dev/sdc', '/dev/sdd', '/dev/sde', '/dev/sdf', '/dev/sdg', '/dev/sdh', '/dev/sdi', '/dev/sdj', '/dev/sdk', '/dev/sdl', '/dev/sdm', '/dev/sdn', '/dev/sdo', '/dev/sdp', '/dev/sdq', '/dev/sdr', '/dev/sds', '/dev/sdt', '/dev/sdu', '/dev/sdv', '/dev/sdw', '/dev/sdx' ]" dedicated_devices="[ '/dev/nvme0n1', '/dev/nvme0n1', 14:58
jeeva'/dev/nvme0n1', '/dev/nvme0n1', '/dev/nvme0n1', '/dev/nvme0n1', '/dev/nvme0n1', '/dev/nvme0n1', '/dev/nvme0n1', '/dev/nvme0n1', '/dev/nvme0n1', '/dev/nvme0n1', '/dev/nvme0n1', '/dev/nvme0n1', '/dev/nvme0n1', '/dev/nvme0n1', '/dev/nvme0n1', '/dev/nvme0n1', '/dev/nvme0n1', '/dev/nvme0n1', '/dev/nvme0n1', '/dev/nvme0n1', '/dev/nvme0n1', '/dev/nvme0n1' ]"14:58
lowercaseoh! you're using a device to store the wal and block.db?14:59
lowercaseI'm here:
lowercaseill switch over to your doc now15:00
jeevayeah, so it creates increments of 3 GB , 30 GB, 300 GB partitions to offload it15:01
jeevaso just make sure you buy the correct size NVMe15:01
jeevasince you can't set that value15:01
lowercaseDo you have any nvme's as an osd?15:01
jeevanope, only SSDs15:01
lowercasehave you tested it?15:02
lowercasecause this might change how i approach the whole new cluster.15:02
jeevanope, but i dont think i see that as something i would consider at the time being15:02
jeevaif it was VMware vSAN i would consider it15:02
jeevabut not with ceph15:02
lowercaseI'm heavily concerned about the durability of nvme's as an osd.15:03
jeevai have a class  SSD2 for our "fast" pool & seperate SSD class pool for cephfs_metadata15:03
lowercaseyeah same, we got spinners for our slow pool and ssds for our fast pool15:04
jeevawe have big files, so nvme wouldn't be feasible15:04
jeevawe got users with like 10 TB single files15:04
lowercasethat's very large.15:04
lowercaseoh you said you work in hpc.15:04
jeevaya well, they take pictures of blackholes with 64 dishes15:04
jeevadata intensive astronomy15:05
lowercaseyou one of those cern guys15:05
jeevano no, im in south africa15:05
jeevaInter-university Institute for Data Intensive Astronomy15:05
lowercaseokay, so that one isn't you guys15:06
jeevano as far as i know our system did play a part in it15:07
jeevawell not the dishes15:08
jeevabut the HPC part15:08
jeevalowercase,: thanks for the mental jousting, ceph fs subvolume resize cephfs 5334a96f-3cbc-4447-8187-7e61219a243f 285873023221760015:11
jeevawas the fix15:11
lowercaseglad i was able to help and meet a cool new friend in the process.15:12
jeevanow these researches can go apeshit over the weekend, since its long weekend and i dont have to worry about it running full before then15:13
jeevalowercase, how much memory does your controllers have? and do you run mgr/mds/mon on it ?15:15
jeevashould probably have moved this to #ceph15:15
lowercasei get banned from there frequently because i join/part too often15:15
lowercaselet's see if the bot allows me in15:16
jeevarun a BNC :P15:16
lowercasei could... but i already work enough hours doing this stuff.15:16
jeevaanyway, after 5PM, home time!15:19
jeevaor rather, i WFH, time to move away from the desk15:20
lowercasewelcome to your weekend.15:20
lowercasehave a good one15:20

Generated by 2.17.3 by Marius Gedminas - find it at!