Sunday, 2024-08-04

f0o_jrosser: regarding https://github.com/openstack/openstack-ansible/blob/master/tests/roles/bootstrap-host/tasks/prepare_nfs.yml#L81 - how do you force nova/cinder/glance to all use UID/GID 10000 in this case?06:24
f0o_also prehaps more interestingly, why did this became an issue now and not much much earlier.. this was running for quite a while without issues...06:26
f0o_I guess I have to override nova_system_user_uid/group_gid as well as cinder's and glance's? it does say that changing these is not really supported... hrm06:33
jrosserf0o_: i would not expect that you have to override those, otherwise surely the ansible code would have some special case for nfs backend and deal with it for you07:03
jrosserall_squash and the anonymous uid/gid will only take effect for newly written items, not existing ones, on your nfs server07:03
f0o_I know but the new files will be UID/GID 10000 which cinder/glance/nova/libvirt does not have access to - so that's failing07:09
f0o_so unless I override the UID/GID of cinder/nova/glance and disable dynamic_ownership in /etc/libvirt/qemu.conf I dont see how it should work07:09
jrosserwhy not build an AIO with the reference nfs deployment07:11
f0o_because I got 20 instances in error state right now and both nova and cinder are throwing perm denied after the run-upgrade which was the staging for production. So this issue will be present in our current prod and I'd like to understand what I can do to mitigate it prior to running the upgrade07:13
f0o_having an AIO that doesnt have to deal with X hosts UID/GID deviances doesnt really represent the reality where some LXC containers have different UID/GIDs now07:13
jrosserok sorry, i'll leave you to it07:14
f0o_I just fail to see how the AIO would help me see the requirement (or lack thereof) for those UID/GID overrides07:14
f0o_but really I'm just baffled that this actually became an issue out of seemingly nowhere. We were able to create and migrate instances without issues so the perm problem should've been hit us much earlier. It just feels like something changed fundamentally07:15
jrosserthere will be no matching of the uid/gid in the AIO between the lxc continers and the nfs server running on the hot07:20
jrosser*host07:20
jrosserthat would be a pretty good represenatiaion of your current situation and allow you to see how it either does, or does not work properly in the configuration we test in CI07:21
jrosserif it doesnt work properly that would be easily reproducible and the chances of getting that fixed would be pretty high07:22
jrosserthe setting of all_squash should be the thing that permits deviations of uid/gid across hosts07:23
jrosseroh well also there is this https://github.com/openstack/openstack-ansible-os_nova/blob/3d385e9d3f96d51957e6b8b5bec91d13f93cd725/defaults/main.yml#L86-L9607:27
f0o_let me try to figure out how can I mount the NFS through VPN and give it a shot then - all_squash does force UID/GID of 10000 on all new files but the issue that the user who created those files has no perms because it's running 999:999 (cinder) or 997:997 (nova)07:27
f0o_yeah those are the overrides I mentioned earlier after googling - that warning should deffo be a bit more prominent07:28
jrosserit is in the docs for the nova role https://docs.openstack.org/openstack-ansible-os_nova/latest/configure-nova.html#shared-storage-and-synchronized-uid-gid07:32
f0o_>> These values should only be set once before deploying an OpenStack environment and then never changed. - Into the rabbit hole I go :D07:32
jrosseryeah - i mean clearly you could change those, it's just quite some work and quite possibly some downtime whilst you do it07:33
f0o_do you happen to know which ansible role changes /etc/libvirt/qemu.conf ? doesnt seem to be os-nova07:34
jrosseri had a look at the nova and cinder roles and really nothing about this changes for many years07:34
f0o_nvmd it was os-nova https://github.com/openstack/openstack-ansible-os_nova/blob/3d385e9d3f96d51957e6b8b5bec91d13f93cd725/tasks/drivers/kvm/nova_compute_kvm.yml#L8407:34
jrosserso i can only think that inside nova/cinder/libvirt/wherever is now tighter or more specific with the permissions used at runtime07:35
f0o_I start to believe that it's libvirt's dynamic_ownership which seems to default to 1 now - so it will change the ownership of the volumes to libvirt:kvm (or equivalent) which deviates from nova/cinder07:36
f0o_so maybe I can just specify qemu_conf_dict entries to set it to 0... gonna give it a shot07:38
jrosseryeah the comment is slightly wrong here https://opendev.org/openstack/openstack-ansible-os_nova/src/branch/master/defaults/main.yml#L58207:40
jrosserit lets you add additional config fields07:40
jrosserif you think that we should change the defaults here please either make a patch or submit a bug report07:42
f0o_I think a safe default for this could be `user = nova` `group = nova` - then libvirt/qemu will run as nova:nova and dynamic_ownerships will just chown all volumes to nova:nova if needed - if we assume that nova:nova == cinder:cinder == glance:glance for NFS then all perm issues are resolved07:43
f0o_the drawback is running qemu as nova:nova which may or may not be a can of worms for apparmor/selinux policies07:43
jrosserunfortunately the active contributors to openstack-ansible are mostly using ceph so we don't get much real feedback on other storage backends07:44
f0o_I guess the alternative is to add the default libvirt user (which unfortunately differ across distros) into the nova group and specify dynamic_ownership=0 and rely that all volumes are chmod 660 at least (through umask or similar)07:45
f0o_not sure which path is "best"07:45
jrosserno indeed, there are lots of moving parts and it would be easy to break something else07:46
f0o_ok so my nova is actually part of the kvm group which also libvirt-qemu is in. So Alternative #2 feels more reasonable07:47
jrosserhttps://github.com/openstack/openstack-ansible-os_nova/blob/master/tasks/drivers/kvm/nova_compute_kvm.yml#L38-L4507:53
f0o_ideally glance/cinder/nova/libvirt accept an additional group as configuration parameter so all NFS stuff could be owned by that group. Then there are no conflicts with preexinsting groups or SELinux/AppArmor based on deviating custom UID/GIDs...07:59
f0o_but that seems like a very ugly patch08:00
f0o_at least for glance uid/gid change was relatively seamless08:19
f0o_gonna see how cinder/nova goes08:19
f0o_opted for running qemu as nova:nova and without dynamic_ownership - hoping apparmor plays ball with it08:20
f0o_if it does then this could be a simple documentation issue where NFS users need to set custom UID/GID and qemu settings08:20
f0o_alright this seems to be working pretty well now09:08
f0o_https://paste.opendev.org/show/bCOVeSboEuLh0CqeDUVo/09:08
f0o_with the all_squash exports config to make all writes uid/gid 1000009:08
f0o_all instances are running as nova now and there's no perm issues anymore09:09
f0oshould https://github.com/openstack/openstack-ansible-os_cinder/blob/stable/2024.1/defaults/main.yml#L35 be master?17:38
f0oI got nearly everything operational, just one cinder-volume instance keeps having an issue with the new quorum queues... keeps crashing with precondition fail on durable/auto_delete which AFAIK was deprecated in favor for quorum queues... I got the rabbitmq settings verified against a working node but this one just wont start... idk if I'm just missing some dependency17:43
f0osomewhere17:43
f0oanyway tomorrow's issue17:44

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!