*** mhen_ is now known as mhen | 01:17 | |
*** ykulkarn is now known as yadnesh | 06:45 | |
*** yadnesh is now known as ykulkarn | 08:51 | |
*** rlandy|out is now known as rlandy | 10:27 | |
mhen | Hello, I've got a question about usage of token authentication in the openstackclient via Keystone. | 13:44 |
---|---|---|
mhen | If I do "openstack --os-auth-type=token --os-token=... --os-auth-url https://keystone:5000/v3 --os-identity-api-version=3 image list" while passing a token previously acquired via "openstack token issue", I get "The service catalog is empty.". | 13:44 |
mhen | I did unset all OS_* shell variables beforehand. | 13:45 |
mhen | Any idea what I might be missing? Generic password authentication using an openrc file works fine. | 13:45 |
jeeva | Anyone have any idea? | 14:02 |
jeeva | 'Insufficient free space for share creation on host' /var/log/kolla/manila" | 14:02 |
jeeva | when I do "manila extend 'name' newsize" | 14:03 |
jeeva | and I have 2.6 PB free | 14:03 |
jeeva | Insufficient free space for share creation on host B-03-37-openstack-ctl@cephfsnative1#cephfs (requested / avail): 2560000/2492466 | 14:03 |
jeeva | trying to extend it to 2600000 | 14:04 |
lowercase | jeeva: is this a case where where someone sells a 1TB drive but you get home and its only 960gb cause of the 2^ math? | 14:09 |
lowercase | what i am saying is it might be 2.6P you are typing in, but that might be getting translated to bytes and you might need to go over some. | 14:10 |
jeeva | i dont know where it gets the idea that it is low on diskspace | 14:11 |
jeeva | currently it is "2560000" | 14:11 |
lowercase | what metric is this number in? 2492466 | 14:12 |
lowercase | is that 2MB | 14:12 |
jeeva | 2.49 PB | 14:12 |
lowercase | 2gigabytes, 2 terrabytes | 14:12 |
lowercase | lol | 14:12 |
lowercase | jeeva: i know you think it is, but i'm asking for you to prove it | 14:12 |
jeeva | i dont know where it get that value from though | 14:13 |
lowercase | one sec | 14:13 |
jeeva | if you take the set value " 2560000" that is 2.56 PB | 14:13 |
jeeva | which the share is currently | 14:14 |
lowercase | and if you do a ceph df, does that number match up? | 14:14 |
lowercase | switching vpns, im gonna drop | 14:14 |
jeeva | 2.5P 2.4P 94T 97% /share | 14:15 |
lowercase | back | 14:15 |
jeeva | 2.5P 2.4P 94T 97% /share | 14:16 |
jeeva | wb | 14:16 |
jeeva | so 2560000 = 2.5P | 14:16 |
lowercase | do that on the ceph cluster, do you have more capacity to give? i.e. does the ceph pool have a limit placed on it | 14:17 |
jeeva | haven't actually checked that ... mmmm | 14:18 |
jeeva | but i have 1.3 PB free, but let me check pool values | 14:18 |
jeeva | max bytes : N/A | 14:19 |
lowercase | i like to use the command, ceph osd pool autoscale-status | 14:19 |
jeeva | so i guess no quota set on cephfs | 14:19 |
lowercase | and look at TARGET-SIZE | 14:19 |
jeeva | no TARGET-SIZE value next to any pools | 14:19 |
lowercase | that's good. | 14:20 |
lowercase | one sec, im looking at my own manilla config | 14:21 |
jeeva | maybe i should have a check at setfattr as well | 14:21 |
jeeva | maybe a limit on setfattr -n ceph.quota.max_bytes | 14:21 |
jeeva | ceph.quota.max_bytes="2748779069440000" | 14:22 |
lowercase | check `manila absolute-limits` | 14:24 |
jeeva | 2748779069440000 is 2.74 PB, but 2.44PiB | 14:24 |
jeeva | lowercase, https://prnt.sc/gOGMceflXI9l looks pretty vanilla | 14:25 |
jeeva | maxTotalShareGigabytes = 1000 Gb | 14:25 |
lowercase | prnt.sc is blocked by my work | 14:25 |
jeeva | https://zerobin.net/?0f9ab0e20a6c06c6#EV1Ers1Q4Xfq7JMZRMoboAKAIc7UDPBgXPSc8OFV3ew= | 14:26 |
jeeva | 1000 gb is 0.1 PB though, and im far pass that | 14:27 |
lowercase | jeeva: ceph.quota.max_bytes="2748779069440000" | 14:28 |
lowercase | manila extend is in gigabytes | 14:28 |
lowercase | 260000 gigabytes is .... | 14:28 |
jeeva | yeah, but ceph output is bytes | 14:28 |
lowercase | 2.791729 petabytes | 14:28 |
jeeva | yeah but my manila share is in gigabytes "2560000" | 14:29 |
jeeva | and trying to exten to 2644000-ish | 14:30 |
jeeva | which is below 2.79 Pb | 14:30 |
lowercase | currently, but you are trying to extend it to 260000 gigabytes, plug that in a calculator is 2791728742400000 bytes which exceeds ceph.quota.max_bytes="2748779069440000" | 14:30 |
jeeva | thanks, so my suspicioun in this calculator is correct | 14:32 |
jeeva | suspicion* | 14:32 |
jeeva | lowercase, thanks for your time & input | 14:32 |
lowercase | anytime. | 14:33 |
jeeva | this is my scratch storage, that is suppose to be ephemeral, but no, "don't delete files on scratch" | 14:33 |
jeeva | lowercase, one last thing | 14:34 |
jeeva | do you think i can increase the ceph quota, and it will not break manila ? | 14:34 |
jeeva | since doesn't the manila command update the ceph quota ? | 14:34 |
lowercase | Honestly, my largest cluster is 2.4PB so i haven't needed to do this. | 14:34 |
jeeva | what is your largest single file ? | 14:35 |
lowercase | I am strictly prohibited at looking at the data on my clusters, so I can honestly say that i don't know. | 14:35 |
jeeva | seriously ? | 14:36 |
jeeva | so you cant even check a single item ? | 14:37 |
jeeva | top secret kek ? | 14:37 |
jeeva | hehe | 14:37 |
lowercase | I do have a MDS cache problem related to cephfs that i haven't dove into. Do you have a way that i can look at sizes of data but not the data itself? | 14:37 |
jeeva | what is your mds_cache_memory_limit | 14:39 |
lowercase | let me get into that cluster, one sec | 14:40 |
jeeva | do you have slow / trim MDS issue ? | 14:40 |
lowercase | 1 clients failing to respond to cache pressure | 14:42 |
jeeva | that is a common thing for us in HPC | 14:43 |
lowercase | MDS_CLIENT_RECALL: 1 clients failing to respond to cache pressure | 14:43 |
lowercase | mds.alplpmultceph03(mds.0): **** failing to respond to cache pressure client_id: | 14:43 |
jeeva | we normally just let the job continue to run, and then set the client (compute node) to reboot when the job is done | 14:43 |
lowercase | ah shit, i censored the other stuff | 14:43 |
lowercase | nah, this is constint | 14:44 |
jeeva | yeah so you need to increase your mds_cache_memory_limit | 14:44 |
jeeva | i.e. have more memory on the controller | 14:44 |
jeeva | which is one of the issues | 14:44 |
jeeva | alternatively check this doc that i sometimes reference | 14:44 |
jeeva | https://indico.cern.ch/event/588794/contributions/2374222/attachments/1383112/2103509/Configuring_Ceph.pdf | 14:45 |
jeeva | i actually had that issue last week again, but it was a job that was running that wasn't properly coded | 14:45 |
jeeva | so it messed around with the storage | 14:45 |
jeeva | 25 Gb/s | 14:45 |
jeeva | i was playing around with my MDS cache preassure issues last week again, had another doc , just have to find it now | 14:46 |
jeeva | btw my mds cache memory limit is currently on 100 GB | 14:48 |
lowercase | oh wow, i just bumped it from the default of 4Gi to 12Gi. | 14:48 |
lowercase | the warning went away but time will tell now | 14:48 |
jeeva | oh wow, 4 GB that is way under specced | 14:49 |
jeeva | for like 3 basic osd nodes | 14:49 |
lowercase | this is a 18 node cluster, with 2.4PB of space, all the drives are behind a raid controller at raid 5. | 14:50 |
lowercase | so, 65 osds | 14:50 |
lowercase | 756gigs of memory each node;. | 14:50 |
jeeva | running 36 node cluster, 24 x 16 TB , 48 core x 256 gb ram each | 14:52 |
jeeva | no RAID | 14:52 |
jeeva | ~12 Pib Raw | 14:52 |
jeeva | each node has NVMe for rockswal/db partition, and 2x 500 GB SSD for cephfs_metadata | 14:52 |
lowercase | you got nvme | 14:53 |
lowercase | how is the performance on those | 14:53 |
lowercase | we are just about to buy a few racks of them. | 14:53 |
jeeva | read somewhere people say the difference is minimum but for me it was a massive change | 14:53 |
jeeva | POC cluster was exactly the same, but the NVMe's wasnt configured | 14:53 |
jeeva | got like 45% throughput of the existing one with RockswalDB | 14:54 |
jeeva | on the bluestore (collocated) | 14:54 |
lowercase | I'm not familiar with RockswalDB | 14:55 |
jeeva | ag that is what i call it | 14:56 |
jeeva | let me get the correct name | 14:56 |
jeeva | `bluestore_wal_devices` | 14:56 |
jeeva | https://github.com/facebook/rocksdb | 14:56 |
jeeva | you configure it in your ceph inventory file | 14:57 |
jeeva | B-02-40-cephosd.maas osd_objectstore=bluestore devices="[ '/dev/sda', '/dev/sdb', '/dev/sdc', '/dev/sdd', '/dev/sde', '/dev/sdf', '/dev/sdg', '/dev/sdh', '/dev/sdi', '/dev/sdj', '/dev/sdk', '/dev/sdl', '/dev/sdm', '/dev/sdn', '/dev/sdo', '/dev/sdp', '/dev/sdq', '/dev/sdr', '/dev/sds', '/dev/sdt', '/dev/sdu', '/dev/sdv', '/dev/sdw', '/dev/sdx' ]" dedicated_devices="[ '/dev/nvme0n1', '/dev/nvme0n1', | 14:58 |
jeeva | '/dev/nvme0n1', '/dev/nvme0n1', '/dev/nvme0n1', '/dev/nvme0n1', '/dev/nvme0n1', '/dev/nvme0n1', '/dev/nvme0n1', '/dev/nvme0n1', '/dev/nvme0n1', '/dev/nvme0n1', '/dev/nvme0n1', '/dev/nvme0n1', '/dev/nvme0n1', '/dev/nvme0n1', '/dev/nvme0n1', '/dev/nvme0n1', '/dev/nvme0n1', '/dev/nvme0n1', '/dev/nvme0n1', '/dev/nvme0n1', '/dev/nvme0n1', '/dev/nvme0n1' ]" | 14:58 |
lowercase | oh! you're using a device to store the wal and block.db? | 14:59 |
lowercase | I'm here: https://docs.ceph.com/en/octopus/rados/configuration/bluestore-config-ref/ | 14:59 |
lowercase | ill switch over to your doc now | 15:00 |
jeeva | yeah, so it creates increments of 3 GB , 30 GB, 300 GB partitions to offload it | 15:01 |
jeeva | so just make sure you buy the correct size NVMe | 15:01 |
jeeva | since you can't set that value | 15:01 |
lowercase | Do you have any nvme's as an osd? | 15:01 |
jeeva | nope, only SSDs | 15:01 |
lowercase | have you tested it? | 15:02 |
lowercase | cause this might change how i approach the whole new cluster. | 15:02 |
jeeva | nope, but i dont think i see that as something i would consider at the time being | 15:02 |
jeeva | if it was VMware vSAN i would consider it | 15:02 |
jeeva | but not with ceph | 15:02 |
lowercase | I'm heavily concerned about the durability of nvme's as an osd. | 15:03 |
jeeva | i have a class SSD2 for our "fast" pool & seperate SSD class pool for cephfs_metadata | 15:03 |
lowercase | yeah same, we got spinners for our slow pool and ssds for our fast pool | 15:04 |
jeeva | we have big files, so nvme wouldn't be feasible | 15:04 |
jeeva | we got users with like 10 TB single files | 15:04 |
lowercase | that's very large. | 15:04 |
lowercase | oh you said you work in hpc. | 15:04 |
jeeva | ya well, they take pictures of blackholes with 64 dishes | 15:04 |
jeeva | data intensive astronomy | 15:05 |
lowercase | you one of those cern guys | 15:05 |
jeeva | no no, im in south africa | 15:05 |
jeeva | Inter-university Institute for Data Intensive Astronomy | 15:05 |
lowercase | https://www.bbc.com/news/science-environment-47891902 | 15:06 |
lowercase | okay, so that one isn't you guys | 15:06 |
jeeva | no as far as i know our system did play a part in it | 15:07 |
jeeva | well not the dishes | 15:08 |
jeeva | but the HPC part | 15:08 |
jeeva | lowercase,: thanks for the mental jousting, ceph fs subvolume resize cephfs 5334a96f-3cbc-4447-8187-7e61219a243f 2858730232217600 | 15:11 |
jeeva | was the fix | 15:11 |
lowercase | glad i was able to help and meet a cool new friend in the process. | 15:12 |
jeeva | ditto | 15:12 |
jeeva | :) | 15:12 |
jeeva | now these researches can go apeshit over the weekend, since its long weekend and i dont have to worry about it running full before then | 15:13 |
jeeva | lowercase, how much memory does your controllers have? and do you run mgr/mds/mon on it ? | 15:15 |
jeeva | should probably have moved this to #ceph | 15:15 |
lowercase | i get banned from there frequently because i join/part too often | 15:15 |
lowercase | let's see if the bot allows me in | 15:16 |
jeeva | run a BNC :P | 15:16 |
lowercase | i could... but i already work enough hours doing this stuff. | 15:16 |
jeeva | anyway, after 5PM, home time! | 15:19 |
jeeva | or rather, i WFH, time to move away from the desk | 15:20 |
lowercase | welcome to your weekend. | 15:20 |
lowercase | have a good one | 15:20 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!