*** hemna6 is now known as hemna | 07:37 | |
*** dasm|off is now known as dasm | 13:25 | |
opendevreview | Alexey Stupnikov proposed openstack/nova master: Support use_multipath for NVME driver https://review.opendev.org/c/openstack/nova/+/823941 | 16:05 |
---|---|---|
spatel | Folks, i have glusterfs mounted on /var/lib/nova for my vms shared storage and everything working but when i delete my vm it didn't clean up files. i can still see disk and other file which taking lots of space. is that normal for nova | 17:37 |
sean-k-mooney | hum technially we dont support moutning /var/lib/nova on glusterfs however the shared file system support we have for mounting it on NFS shoudl work with it | 17:38 |
sean-k-mooney | the way we detect a shared files system is by touching a file which shoudl just work with any shared file system | 17:39 |
sean-k-mooney | in terems of deleting files | 17:39 |
sean-k-mooney | when you delete the vm it should delete the vm disk and other files in general | 17:39 |
sean-k-mooney | the backing files for hte guest image may stay in the image cache for a peiord of time but it will eventully get cleaned up | 17:40 |
sean-k-mooney | i wonder if this is liek the ceph issue | 17:40 |
sean-k-mooney | https://docs.openstack.org/nova/latest/configuration/config.html#workarounds.ensure_libvirt_rbd_instance_dir_cleanup | 17:40 |
spatel | hmm why did you say don't support glusterfs for /var/lib/nova ? | 17:41 |
sean-k-mooney | spatel: because we dont offcialy support or test that upstream | 17:41 |
sean-k-mooney | we test /var/lib/nova on nfs even thouhg we recommend using it | 17:42 |
spatel | but generally shared storage is storage just create file and delete file.. no matter who is in backend | 17:42 |
sean-k-mooney | but we have never offcially supprot clusterfs as afar as i recall | 17:42 |
sean-k-mooney | spatel: there are a buch of issue with locks and move operation that the latency of shared storage can cause | 17:43 |
sean-k-mooney | so in general we do not support it | 17:43 |
spatel | we have 800TB of glusterfs and it would be bad if i mount that on foo server and then mount on 50 compute nodes | 17:43 |
spatel | That is interesting... | 17:44 |
sean-k-mooney | in generall if you want ot use glusterfs with openstack the only supporte way to do it would via cinder | 17:44 |
sean-k-mooney | it can work the way you have it deplyed but its not a tested configuration and was never expiclty supported | 17:44 |
spatel | may be i am the first use case here.. :) i would still like to push it and try out because we already have large storage which i would like to use. | 17:46 |
spatel | if i see any issue (major issue) then i may switch to NFS | 17:46 |
spatel | If i go with cinder then its not going to be shared storage correct? does live migration will be supported with cinder base deployment? | 17:47 |
sean-k-mooney | spatel: well jsut a word of warning if we could remove support for this capablity entirly we would | 17:48 |
sean-k-mooney | we dont like support nfs backed /var/lib/nova | 17:48 |
sean-k-mooney | we only maintian supprot becasue we have some old large clouds that use it but this area fo the code is not activly maintained | 17:48 |
spatel | hmm! so in short no to shared storage except ceph. correct? | 17:49 |
sean-k-mooney | basically and even for ceph not by using cephfs | 17:49 |
spatel | ceph rbd | 17:49 |
sean-k-mooney | the probelm is that when using shared storate like this we are actully using a local storage driver in nova | 17:49 |
sean-k-mooney | e.g. image_type=qcow | 17:50 |
spatel | i know what you saying.. mounting nova on shared storage can have all kind of issue | 17:50 |
sean-k-mooney | and have sprincked some check in random parts of the code to do something else fi we detect its on shared storage | 17:50 |
sean-k-mooney | the main one come up aroudn move operations and evacuate | 17:50 |
sean-k-mooney | e.g. we need to make sure we dont overrie the image if we do a cold/live migration | 17:51 |
sean-k-mooney | and dont recreate it if we evacuate | 17:51 |
sean-k-mooney | to prevent loosing data | 17:51 |
spatel | hmm | 17:51 |
spatel | so what option i have in current design with glusterfs ? | 17:52 |
spatel | cinder boot volum? | 17:52 |
sean-k-mooney | if the sync time between writing a file on one host and reading it on another is long enough we might not detect its on shared storage whchi can be bad | 17:52 |
sean-k-mooney | cinder boot form vomue woudl work yes | 17:52 |
spatel | does cinder support glusterfs ? | 17:53 |
sean-k-mooney | what you are doign can work just be aware that there are dragons wehn you do this. | 17:53 |
sean-k-mooney | am it used too | 17:53 |
spatel | I understand this cluster not going to hold critical data. this is for crunch some data and give dynamic result | 17:54 |
sean-k-mooney | am its avaible for cinder backup | 17:54 |
sean-k-mooney | https://docs.openstack.org/cinder/latest/drivers.html#glusterfsbackupdriver | 17:54 |
sean-k-mooney | not sure about cidner in general | 17:54 |
spatel | let me do some research and see what i can do | 17:55 |
sean-k-mooney | in generall it looks like no | 17:56 |
spatel | oh boy :( | 17:56 |
sean-k-mooney | we do not have a volume driver in nova for gluster | 17:56 |
sean-k-mooney | https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L176-L191 | 17:56 |
sean-k-mooney | so it can be used for backup bug not vms | 17:56 |
spatel | I think i should try NFS now because folks are using it.. | 17:57 |
spatel | i know you hate but i need something to move forward and deal with issue later. :) | 17:57 |
spatel | we have plan to buy dedicated ceph storage but that is not today but after few months | 17:58 |
sean-k-mooney | ack the way we detech shared storage https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L10893-L10912 | 17:59 |
sean-k-mooney | in principal shoudl work with gluster | 17:59 |
sean-k-mooney | but im not sure why it woudl not delete the instace directory | 17:59 |
sean-k-mooney | althogh i dont think that is the only way we check | 18:00 |
sean-k-mooney | spatel: we have things like this | 18:00 |
sean-k-mooney | https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L1723-L1731 | 18:00 |
spatel | sean-k-mooney does nova has any time etc.. where it wait and delete? | 18:01 |
spatel | I am planning to turn on debug to see what is going on ? | 18:01 |
sean-k-mooney | well i think part of the delete wil be done by libvirt | 18:02 |
sean-k-mooney | this si where we do some o fthe lean up https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L1502 | 18:02 |
sean-k-mooney | https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L11641-L11697 | 18:02 |
spatel | hmm.. is there any special config which i may missed | 18:02 |
sean-k-mooney | look for LOG.info('Deletion of %s failed', remaining_path, | 18:03 |
sean-k-mooney | instance=instance) | 18:03 |
sean-k-mooney | spatel: you could try https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L10619-L10647 | 18:06 |
sean-k-mooney | its contoled by instance_delete_interval | 18:06 |
sean-k-mooney | https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.instance_delete_interval | 18:07 |
sean-k-mooney | spatel: so nova will try to delete the instance every 5 minuts and then retry up to 5 times | 18:08 |
sean-k-mooney | https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.maximum_instance_delete_attempts | 18:08 |
spatel | let me check that setting and wait.. i can turn on debug also | 18:09 |
sean-k-mooney | spatel: check the compute agent log for that info message first | 18:09 |
sean-k-mooney | to confirm that nova failed to delete the instance files | 18:09 |
spatel | give me few minute to collect some logs | 18:11 |
sean-k-mooney | no worries | 18:15 |
spatel | sean-k-mooney as quick check i found this - https://paste.opendev.org/show/812229/ | 18:28 |
spatel | libvirt trying to delete but its getting failed.. very odd | 18:29 |
spatel | now going to turn on debug and see | 18:29 |
sean-k-mooney | ya that is the message i was expecting | 18:30 |
sean-k-mooney | so like nfs https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L11079 it look like deleteion on gluster is also unreliable | 18:31 |
spatel | https://paste.opendev.org/show/812230/ | 18:32 |
spatel | This is odd, when i tried to delete that by hand i got above error | 18:33 |
sean-k-mooney | so ya looks like a gluster issue | 18:33 |
spatel | but it deleted files inside that directory | 18:33 |
spatel | I am going to talk to someone who own that storage and see what is going on | 18:33 |
sean-k-mooney | ack | 18:33 |
spatel | Thanks for checking | 18:34 |
sean-k-mooney | im not sure how gluster tracks files vs direcoties | 18:34 |
sean-k-mooney | but it may handel directory inodes differently | 18:34 |
spatel | https://paste.opendev.org/show/812231/ | 18:34 |
spatel | something is wrong with gluster for sure.. i am able to delete files but not directory | 18:35 |
sean-k-mooney | ack | 18:35 |
sean-k-mooney | ya it could also be fuse | 18:35 |
sean-k-mooney | are you mounting glusterfs with fuse or using a kernel dirver | 18:36 |
spatel | 10.10.217.21:gluster_vol2/voyager /mnt/glusterfs glusterfs defaults,backup-volfile-servers=10.10.217.22:10.10.217.23:10.10.217.24:10.10.217.25:10.10.217.26 | 18:37 |
spatel | fuse.. hmm ? | 18:37 |
spatel | i just install glusterfs-client RPM package and use mount tool to mount it | 18:37 |
sean-k-mooney | i think there is a user space dirver and a kernel dirver for gluster like there is for ceph | 18:37 |
spatel | let me check.. that is good point | 18:38 |
sean-k-mooney | https://bugzilla.redhat.com/show_bug.cgi?id=1508999 | 18:39 |
spatel | Reading bug report, looks interesting | 18:43 |
sean-k-mooney | i have not looked at it closely but the title seamed relevent | 18:44 |
spatel | i am having same issue but trying to understand how they fix. i don't control gluster so not sure but i can explain someone | 18:44 |
sean-k-mooney | i would start with your simple repoducer | 18:45 |
sean-k-mooney | e.g. create teh dir and show you cant delete it | 18:45 |
sean-k-mooney | that really shoudl work | 18:45 |
sean-k-mooney | one thing to check is it any directoy or just ones at the root of the volume | 18:46 |
sean-k-mooney | i.e. does mkdir -p .../temp/mydata rm -rf .../temp/mydata work | 18:47 |
spatel | let me check that | 18:51 |
spatel | any directory in tree period | 18:52 |
sean-k-mooney | ack but it can delete files | 18:52 |
sean-k-mooney | that is very odd indeed | 18:53 |
spatel | i am able to delete files not matter where they located but not able to delete any dir | 18:53 |
spatel | Yes i can delete files.. | 18:53 |
spatel | not directory | 18:53 |
sean-k-mooney | ya i have never seen that before honestly unless the permission of the folder vs fiels are diffent | 18:53 |
sean-k-mooney | but the error implies it an internal gluster issue not a simple permissions one | 18:54 |
spatel | I asked someone to take a look | 18:54 |
spatel | sean-k-mooney i have ask someone to take a look so hope they find something | 20:32 |
opendevreview | Alexey Stupnikov proposed openstack/nova master: Support use_multipath for NVME driver https://review.opendev.org/c/openstack/nova/+/823941 | 21:34 |
clarkb | sean-k-mooney: fyi https://git.centos.org/rpms/systemd/c/3d3dc89fb25868e8038ecac8d5aef0603bdfaaa2?branch=c8s was recently committed. I don't know how/when/if that will become a package in the package repos but progress | 23:02 |
*** dasm is now known as dasm|off | 23:33 | |
sean-k-mooney | ack i tihnk it should go through the koji automated build once th commit lands in dist git automatically | 23:39 |
sean-k-mooney | clarkb: so i woudl expect that to show up relitivly quickly once its commited | 23:39 |
sean-k-mooney | clarkb: https://koji.mbox.centos.org/koji/buildinfo?buildID=20898 there was the attmepted build | 23:45 |
sean-k-mooney | looks like it failed | 23:46 |
sean-k-mooney | 155/298 test-procfs-util FAIL 0.32s killed by signal 6 SIGABRT | 23:49 |
sean-k-mooney | ――――――――――――――――――――――――――――――――――――― ✀ ――――――――――――――――――――――――――――――――――――― | 23:50 |
sean-k-mooney | stderr: | 23:50 |
sean-k-mooney | Current system CPU time: 5month 4w 4h 23min 16.380000s | 23:50 |
sean-k-mooney | Current memory usage: 34.6G | 23:50 |
sean-k-mooney | Current number of tasks: 681 | 23:50 |
sean-k-mooney | kernel.pid_max: 40960 | 23:50 |
sean-k-mooney | kernel.threads-max: 1030309 | 23:50 |
sean-k-mooney | Limit of tasks: 40959 | 23:50 |
sean-k-mooney | Reducing limit by one to 40958… | 23:50 |
clarkb | Iguess a failed build is still prgress | 23:50 |
sean-k-mooney | procfs_tasks_set_limit: Permission denied | 23:50 |
sean-k-mooney | Assertion 'r >= 0 ? w == v - 1 : w == v' failed at ../src/test/test-procfs-util.c:59, function main(). Aborting. | 23:50 |
sean-k-mooney | ―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― | 23:50 |
sean-k-mooney | ya not really sure why that failed to be honest | 23:50 |
sean-k-mooney | there si one failrue out of aorund 300 build tests | 23:50 |
sean-k-mooney | so its morst fine | 23:51 |
sean-k-mooney | the build logs are here https://koji.mbox.centos.org/koji/taskinfo?taskID=334490 | 23:51 |
sean-k-mooney | in case your intersted | 23:51 |
sean-k-mooney | hopefully its just a buggy test and a rebuild wil fix it but in anycase hopefuly it will get adressed soon | 23:52 |
sean-k-mooney | im pretty sure i do not have an account that can retriger that on that koji instance so ill just have to wait and see but i can link the failed build on the bugzilla bug | 23:53 |
clarkb | ya I'm not really in a hurry myself more just trying to follow along since we get semi regular question about it. Though those have died down recently. I think people just know ping is broken now | 23:54 |
sean-k-mooney | ack https://bugzilla.redhat.com/show_bug.cgi?id=2037807#c10 comment on the bug. ill check it again tomorow but at least peopel are aware of the issue | 23:58 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!