f0o_ | jrosser: regarding https://github.com/openstack/openstack-ansible/blob/master/tests/roles/bootstrap-host/tasks/prepare_nfs.yml#L81 - how do you force nova/cinder/glance to all use UID/GID 10000 in this case? | 06:24 |
---|---|---|
f0o_ | also prehaps more interestingly, why did this became an issue now and not much much earlier.. this was running for quite a while without issues... | 06:26 |
f0o_ | I guess I have to override nova_system_user_uid/group_gid as well as cinder's and glance's? it does say that changing these is not really supported... hrm | 06:33 |
jrosser | f0o_: i would not expect that you have to override those, otherwise surely the ansible code would have some special case for nfs backend and deal with it for you | 07:03 |
jrosser | all_squash and the anonymous uid/gid will only take effect for newly written items, not existing ones, on your nfs server | 07:03 |
f0o_ | I know but the new files will be UID/GID 10000 which cinder/glance/nova/libvirt does not have access to - so that's failing | 07:09 |
f0o_ | so unless I override the UID/GID of cinder/nova/glance and disable dynamic_ownership in /etc/libvirt/qemu.conf I dont see how it should work | 07:09 |
jrosser | why not build an AIO with the reference nfs deployment | 07:11 |
f0o_ | because I got 20 instances in error state right now and both nova and cinder are throwing perm denied after the run-upgrade which was the staging for production. So this issue will be present in our current prod and I'd like to understand what I can do to mitigate it prior to running the upgrade | 07:13 |
f0o_ | having an AIO that doesnt have to deal with X hosts UID/GID deviances doesnt really represent the reality where some LXC containers have different UID/GIDs now | 07:13 |
jrosser | ok sorry, i'll leave you to it | 07:14 |
f0o_ | I just fail to see how the AIO would help me see the requirement (or lack thereof) for those UID/GID overrides | 07:14 |
f0o_ | but really I'm just baffled that this actually became an issue out of seemingly nowhere. We were able to create and migrate instances without issues so the perm problem should've been hit us much earlier. It just feels like something changed fundamentally | 07:15 |
jrosser | there will be no matching of the uid/gid in the AIO between the lxc continers and the nfs server running on the hot | 07:20 |
jrosser | *host | 07:20 |
jrosser | that would be a pretty good represenatiaion of your current situation and allow you to see how it either does, or does not work properly in the configuration we test in CI | 07:21 |
jrosser | if it doesnt work properly that would be easily reproducible and the chances of getting that fixed would be pretty high | 07:22 |
jrosser | the setting of all_squash should be the thing that permits deviations of uid/gid across hosts | 07:23 |
jrosser | oh well also there is this https://github.com/openstack/openstack-ansible-os_nova/blob/3d385e9d3f96d51957e6b8b5bec91d13f93cd725/defaults/main.yml#L86-L96 | 07:27 |
f0o_ | let me try to figure out how can I mount the NFS through VPN and give it a shot then - all_squash does force UID/GID of 10000 on all new files but the issue that the user who created those files has no perms because it's running 999:999 (cinder) or 997:997 (nova) | 07:27 |
f0o_ | yeah those are the overrides I mentioned earlier after googling - that warning should deffo be a bit more prominent | 07:28 |
jrosser | it is in the docs for the nova role https://docs.openstack.org/openstack-ansible-os_nova/latest/configure-nova.html#shared-storage-and-synchronized-uid-gid | 07:32 |
f0o_ | >> These values should only be set once before deploying an OpenStack environment and then never changed. - Into the rabbit hole I go :D | 07:32 |
jrosser | yeah - i mean clearly you could change those, it's just quite some work and quite possibly some downtime whilst you do it | 07:33 |
f0o_ | do you happen to know which ansible role changes /etc/libvirt/qemu.conf ? doesnt seem to be os-nova | 07:34 |
jrosser | i had a look at the nova and cinder roles and really nothing about this changes for many years | 07:34 |
f0o_ | nvmd it was os-nova https://github.com/openstack/openstack-ansible-os_nova/blob/3d385e9d3f96d51957e6b8b5bec91d13f93cd725/tasks/drivers/kvm/nova_compute_kvm.yml#L84 | 07:34 |
jrosser | so i can only think that inside nova/cinder/libvirt/wherever is now tighter or more specific with the permissions used at runtime | 07:35 |
f0o_ | I start to believe that it's libvirt's dynamic_ownership which seems to default to 1 now - so it will change the ownership of the volumes to libvirt:kvm (or equivalent) which deviates from nova/cinder | 07:36 |
f0o_ | so maybe I can just specify qemu_conf_dict entries to set it to 0... gonna give it a shot | 07:38 |
jrosser | yeah the comment is slightly wrong here https://opendev.org/openstack/openstack-ansible-os_nova/src/branch/master/defaults/main.yml#L582 | 07:40 |
jrosser | it lets you add additional config fields | 07:40 |
jrosser | if you think that we should change the defaults here please either make a patch or submit a bug report | 07:42 |
f0o_ | I think a safe default for this could be `user = nova` `group = nova` - then libvirt/qemu will run as nova:nova and dynamic_ownerships will just chown all volumes to nova:nova if needed - if we assume that nova:nova == cinder:cinder == glance:glance for NFS then all perm issues are resolved | 07:43 |
f0o_ | the drawback is running qemu as nova:nova which may or may not be a can of worms for apparmor/selinux policies | 07:43 |
jrosser | unfortunately the active contributors to openstack-ansible are mostly using ceph so we don't get much real feedback on other storage backends | 07:44 |
f0o_ | I guess the alternative is to add the default libvirt user (which unfortunately differ across distros) into the nova group and specify dynamic_ownership=0 and rely that all volumes are chmod 660 at least (through umask or similar) | 07:45 |
f0o_ | not sure which path is "best" | 07:45 |
jrosser | no indeed, there are lots of moving parts and it would be easy to break something else | 07:46 |
f0o_ | ok so my nova is actually part of the kvm group which also libvirt-qemu is in. So Alternative #2 feels more reasonable | 07:47 |
jrosser | https://github.com/openstack/openstack-ansible-os_nova/blob/master/tasks/drivers/kvm/nova_compute_kvm.yml#L38-L45 | 07:53 |
f0o_ | ideally glance/cinder/nova/libvirt accept an additional group as configuration parameter so all NFS stuff could be owned by that group. Then there are no conflicts with preexinsting groups or SELinux/AppArmor based on deviating custom UID/GIDs... | 07:59 |
f0o_ | but that seems like a very ugly patch | 08:00 |
f0o_ | at least for glance uid/gid change was relatively seamless | 08:19 |
f0o_ | gonna see how cinder/nova goes | 08:19 |
f0o_ | opted for running qemu as nova:nova and without dynamic_ownership - hoping apparmor plays ball with it | 08:20 |
f0o_ | if it does then this could be a simple documentation issue where NFS users need to set custom UID/GID and qemu settings | 08:20 |
f0o_ | alright this seems to be working pretty well now | 09:08 |
f0o_ | https://paste.opendev.org/show/bCOVeSboEuLh0CqeDUVo/ | 09:08 |
f0o_ | with the all_squash exports config to make all writes uid/gid 10000 | 09:08 |
f0o_ | all instances are running as nova now and there's no perm issues anymore | 09:09 |
f0o | should https://github.com/openstack/openstack-ansible-os_cinder/blob/stable/2024.1/defaults/main.yml#L35 be master? | 17:38 |
f0o | I got nearly everything operational, just one cinder-volume instance keeps having an issue with the new quorum queues... keeps crashing with precondition fail on durable/auto_delete which AFAIK was deprecated in favor for quorum queues... I got the rabbitmq settings verified against a working node but this one just wont start... idk if I'm just missing some dependency | 17:43 |
f0o | somewhere | 17:43 |
f0o | anyway tomorrow's issue | 17:44 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!