opendevreview | Merged openstack/nova master: db: Remove legacy migrations https://review.opendev.org/c/openstack/nova/+/872428 | 01:08 |
---|---|---|
Uggla | sean-k-mooney, gibi, bauzas I'm facing an issue. I can mount a share on the compute if I'am an admin, but not a regular user. I think it is something around privsep but I did not figure what it could be. Any idea what could be the issue ? | 09:05 |
gibi | Uggla: how this fails? | 09:06 |
Uggla | I receive this error as a regular user : mount.nfs: mounting 192.168.122.76:/opt/stack/data/manila/mnt/share-7643c9cb-bf52-4263-b048-2b3589451cd9 failed, reason given by server: No such file or directory\n' | 09:07 |
Uggla | however the directory exists. | 09:07 |
Uggla | as an example using the mount -t nfs 192.168.122.76:/opt/stack/data/manila/mnt/share-7643c9cb-bf52-4263-b048-2b3589451cd9 /opt/stack/data/nova/mnt/57c1c5f49444e0ccdcd1d46b5d640827 command as root is ok. | 09:08 |
opendevreview | Stephen Finucane proposed openstack/nova master: doc: Update version info https://review.opendev.org/c/openstack/nova/+/880614 | 09:09 |
Uggla | as root | 09:09 |
Uggla | gibi, https://paste.opendev.org/show/bypnmFjxs469LqyF1nsB/ | 09:15 |
bauzas | Uggla: I'm unclear, is the mount failing ? | 09:16 |
Uggla | bauzas, yes the mount is failing if I am an openstack "regular" user (demo). Not if I am an admin. | 09:18 |
bauzas | I see | 09:18 |
bauzas | which service is doing the mount ? | 09:19 |
bauzas | nova-compute right? | 09:19 |
Uggla | bauzas, compute | 09:19 |
Uggla | yes | 09:19 |
* bauzas needs to look at your driver code | 09:19 | |
bauzas | sec | 09:19 |
gibi | it is strange the mount call does not take the ctx so I'm not sure how this code can depend on the user token | 09:20 |
bauzas | I'm confused, I'm trying to see where we internally syscall mount | 09:23 |
Uggla | gibi, hopefully I have a witness (artom) prooving I'm not totally mad. ;) | 09:23 |
bauzas | ah, found it | 09:24 |
bauzas | in https://review.opendev.org/c/openstack/nova/+/833090/29/nova/virt/libvirt/driver.py#4115 | 09:24 |
bauzas | you're calling nfs.LibvirtNFSVolumeDriver with connect_volume() | 09:25 |
bauzas | Uggla: amirite ? | 09:25 |
Uggla | yes | 09:26 |
Uggla | then it is calling mount --> privsep --> fs | 09:26 |
bauzas | https://github.com/openstack/nova/blob/master/nova/virt/libvirt/volume/fs.py#L113 | 09:27 |
bauzas | found it | 09:27 |
bauzas | https://github.com/openstack/nova/blob/master/nova/virt/libvirt/volume/mount.py#L262 this is what's eventually called | 09:28 |
bauzas | which indeed calls a prevsep helper https://github.com/openstack/nova/blob/master/nova/virt/libvirt/volume/mount.py#L308 | 09:29 |
bauzas | and yeah look https://github.com/openstack/nova/blob/master/nova/privsep/fs.py#L30 | 09:29 |
bauzas | oh wait | 09:33 |
Uggla | bauzas, yes and as gibi said there is no link with the context. That's the reason why I don't understand the issue. | 09:33 |
bauzas | Uggla: as a reminder, the privsep helper doesn't run as root | 09:33 |
bauzas | it runs as the nova user with sudo rights, which is different | 09:34 |
bauzas | iirc, rootwrap only runs privsep with https://github.com/openstack/nova/blob/master/etc/nova/rootwrap.d/compute.filters | 09:36 |
bauzas | and then the privsep client connects to the privsep socket | 09:37 |
bauzas | that's why I'm suspecting you miss some privsep capability | 09:37 |
Uggla | bauzas, hum you lost me. As privsep is a replacement for rootwrap in my mind. | 09:39 |
bauzas | Uggla: privsep is a deamon https://github.com/openstack/oslo.privsep/blob/master/oslo_privsep/daemon.py | 09:41 |
bauzas | and iirc, we start it thru rootwrap https://github.com/openstack/oslo.privsep/blob/master/oslo_privsep/daemon.py#L32 | 09:41 |
bauzas | then, the privsep client connects to the deamon which has escalated rights | 09:42 |
gibi | bauzas: if the same sequence works with and admin user token but not with the demo user token then I don't think this is a privsep capability issue but I have no better idea :/ | 09:42 |
bauzas | gibi: I haven't understand the same | 09:42 |
bauzas | I thought Uggla was saying that the mount wasn't working with a nova user, but a mount command was working fine with sudo righrs | 09:43 |
bauzas | hence me thinking this was about a missing cap | 09:43 |
bauzas | but agreed, if an admin API call can mount thru nova, then yeah, this is a keystone context problem and not a privsep one | 09:44 |
Uggla | bauzas, the sudo mount command was just to prove that it is not an issue with missing directory and that mount.nfs is possible on the host. And it works fine as an os admin and not as a os user. | 09:46 |
bauzas | ohok | 09:46 |
bauzas | my bad then | 09:46 |
bauzas | soooo, pdb it | 09:47 |
stephenfin | sqlalchemy-migrate is finally gone. Whoop! 🥳 | 09:48 |
bauzas | Uggla: have you checked that all the attributes for the privsep.mount call are the same, between a standard user and an admin user? | 09:48 |
stephenfin | Now to get the other projects to do the same | 09:48 |
stephenfin | Thanks for the reviews :) | 09:48 |
bauzas | stephenfin: kudos for the win. | 09:48 |
stephenfin | I hope nova jobs will start passing here shortly https://review.opendev.org/c/openstack/requirements/+/879743 | 09:49 |
Uggla | bauzas, unless if I'm wrong we are calling the same. | 09:49 |
Uggla | bauzas, if you wish I can show you a demo in the beginning of the afternoon. | 09:50 |
bauzas | Uggla: I've seen a couple of log debug messages, should be quick to verify | 09:50 |
bauzas | Uggla: sure | 09:50 |
songwenping | bauzas, sean-k-mooney: hi guys, i have attach nvidia gpu(v100/A100) to the vm with 'virsh attach-device --persistent --live' , i can find the gpu in the vm with lspci, but nvidia-smi cnanot find the gpu, with the error 'NVRM: GPU 0000:05:00.0: RmInitAdapter failed! ', do you have any advices? | 10:26 |
opendevreview | Rajesh Tailor proposed openstack/nova master: Fix case-sensitivity for metadata keys https://review.opendev.org/c/openstack/nova/+/873901 | 11:29 |
bauzas | songwenping: this is unfortunately not a nova issue, IMHO | 12:07 |
bauzas | if you can list the vgpu by lcpci, it looks to me a nvidia driver issue | 12:08 |
songwenping | bauzas: right, but where can we find some doc to prove it's nvidia driver issue? | 12:14 |
opendevreview | Stephen Finucane proposed openstack/placement master: db: Replace use of deprecated API https://review.opendev.org/c/openstack/placement/+/880623 | 12:24 |
opendevreview | Stephen Finucane proposed openstack/placement master: tests: Use base class for all functional tests https://review.opendev.org/c/openstack/placement/+/880624 | 12:24 |
opendevreview | Stephen Finucane proposed openstack/placement master: tests: Warn on *any* SAWarning warning https://review.opendev.org/c/openstack/placement/+/880625 | 12:24 |
Uggla | gibi, bauzas I have progressed a little. In fact I was fooled. The mount issue is not linked to user vs admin. But the first to the api is failing, but the second one is working, and this is not linked to timing. | 12:31 |
Uggla | s/first/first call/ | 12:32 |
bauzas | Uggla: let's discuss this directly by gmeet if you want ;) | 12:46 |
Uggla | ok | 12:47 |
Uggla | cleaning my env and calling you | 12:47 |
bauzas | songwenping: honestly, nova just adds a mdev on the libvirt XML https://libvirt.org/drvnodedev.html#mediated-devices-mdevs | 12:48 |
bauzas | Uggla: ok | 12:48 |
artom | Uggla, oh, the privsep mount thing | 13:22 |
artom | Yeah, it's super weird, I am indeed a witness | 13:22 |
opendevreview | Dan Smith proposed openstack/nova master: Remove silent failure to find a node on rebuild https://review.opendev.org/c/openstack/nova/+/880632 | 13:41 |
opendevreview | Dan Smith proposed openstack/nova master: Stop ignoring missing compute nodes in claims https://review.opendev.org/c/openstack/nova/+/880633 | 13:41 |
dansmith | bauzas: I pulled the RT stuff out into a separate set^ | 13:41 |
bauzas | dansmith: on a long call with Uggla but okay, I'll try | 13:42 |
dansmith | and fixed another silent failure we ignore in evacuate | 13:42 |
bauzas | gibi: dansmith: sean-k-mooney: fyi, we discussed with Uggla about his series and we discovered a potential large leak for ACLs in Manila access rights | 14:21 |
gibi | ack | 14:21 |
bauzas | gibi: dansmith: sean-k-mooney: when adding an access-allow for the share, we pass the compute IP address to Manila | 14:21 |
dansmith | which is visible by the user? | 14:21 |
bauzas | then the IP address can be seen by any user in the same project, and also any user can delete this ACL | 14:21 |
dansmith | that'd be bad | 14:22 |
bauzas | so we need to tell the Manila folks to somehow hide those details | 14:22 |
dansmith | yeah | 14:22 |
bauzas | if the ACL is done by a service, it shouldn't be seen by an enduser, neither be able to delete it | 14:23 |
bauzas | if we ask Manila to lock a share, that's a different concern | 14:23 |
sean-k-mooney | bauzas: this is partly why i wanted to use the cert auth method not the ip one | 14:23 |
bauzas | because users can delete ACLs without unlocking | 14:23 |
dansmith | ...yup | 14:24 |
bauzas | sean-k-mooney: it should also leak the certificate, right? | 14:24 |
bauzas | and users could also delete the ACL | 14:24 |
bauzas | there are two problems to resolve : 1/ we need to hide the ACL, 2/ we need to make sure it's not possible to delete it by an user | 14:25 |
Uggla | sean-k-mooney, also today you can do a manila show share that will reveal IP + export location | 14:25 |
sean-k-mooney | the cert would be a per vm cert generted for that insntace | 14:25 |
bauzas | we == Manila API | 14:25 |
sean-k-mooney | and it woudl only be the public key | 14:25 |
sean-k-mooney | so i dont think we care if that is visable | 14:25 |
dansmith | does nova use only the user's token to talk to mania? | 14:26 |
bauzas | sean-k-mooney: then we would need to extend the lock mechanism to deny any ACL modification | 14:26 |
dansmith | *manila | 14:26 |
sean-k-mooney | https://docs.openstack.org/api-ref/shared-file-system/?expanded=grant-access-detail#grant-access | 14:26 |
bauzas | dansmith: we tested this also with admin | 14:26 |
dansmith | I'm asking | 14:27 |
sean-k-mooney | i would prefer to use cert or user for auth then ip | 14:27 |
bauzas | dansmith: if you're an admin, you can generate a share and add an ACL | 14:27 |
bauzas | dansmith: but any user from the same project can both see the share and delete the ACL you created | 14:27 |
dansmith | bauzas: can you answer my question? | 14:27 |
dansmith | does nova use the user's token (only) to talk to manila? | 14:28 |
sean-k-mooney | dansmith: yes i belive we use only the user token in the current proposal | 14:28 |
bauzas | dansmith: IIRC today yes but Uggla knows better about it | 14:28 |
dansmith | ack | 14:28 |
sean-k-mooney | but we could add an admin token if we needed too. i dont think we use any admin api today | 14:28 |
dansmith | we have a lot of different places where we've done that in the past and it has come back to bite us | 14:28 |
sean-k-mooney | but we might for the lock api | 14:28 |
bauzas | we discussed this at the PTG, we could use a service token if Manila adds it | 14:28 |
bauzas | but again, we missed at the PTG the ACL problem | 14:28 |
dansmith | cinder, glance, etc.. so perhaps we should not replicate the same thing here | 14:28 |
sean-k-mooney | bauzas: the probelm is leakign the ip/subnet of the nova comptue host | 14:29 |
bauzas | since anyone from the same project can read anything about shares, then a service token needs to be related to a specific different project | 14:29 |
sean-k-mooney | the fix for that is to use a differnt grant type | 14:29 |
dansmith | I know that even if nova uses admin or service, manila probably needs extra work to not allow a user to delete/see that thing | 14:29 |
dansmith | but I'm saying.. using the user's token puts us at a disadvantage for actually filtering those things | 14:29 |
dansmith | sean-k-mooney: that may be better anyway, but I'm saying it might be a good idea to not go down this same road again | 14:30 |
gibi | so when nova adds the compute IP to a request sent to manila with the user token, nova basically leaked the IP of the compute to that user (and project) | 14:30 |
bauzas | sean-k-mooney: that's not only an IP leak problem, this is also a security problem since a malicious user can block the mount by deleting the ACL | 14:30 |
Uggla | sean-k-mooney, we need to take care even with certs, I'm not sure that manila will not expose the IP in the export location as well. | 14:30 |
dansmith | even if we use cert auth, we're leaking something about the compute node | 14:30 |
sean-k-mooney | bauzas: im aware but if we have locked the share you shoudl not be able to do that | 14:30 |
sean-k-mooney | so we can include that in the spec for the new lock api | 14:30 |
dansmith | two users can conspire to compare certs about their hosts if they can see them and determine if they're on the same compute node, for example | 14:31 |
bauzas | yup, + again, we allow anyone to block a mount | 14:31 |
dansmith | the cert probably has the hostname in it too right? | 14:31 |
sean-k-mooney | dansmith not if the cert is per instance | 14:31 |
bauzas | I think sean was proposing an instance-based cert | 14:31 |
sean-k-mooney | but otherwise yes | 14:31 |
bauzas | but that doesn't solve the ACL delete case | 14:31 |
dansmith | ack | 14:31 |
dansmith | right | 14:31 |
bauzas | and it adds a lot of complexity for little benefits | 14:31 |
bauzas | + I'm unsure that the export location doesn't show the IP address, even with certr | 14:32 |
dansmith | I think manila is going to need finer granularity for this sort of thing, and nova is going to have to not use the user's token (alone) to do this stuff if we're going to distinguish between the user and nova | 14:32 |
bauzas | dansmith: yup, and we need to round it back to the Manila team | 14:34 |
dansmith | yeah | 14:34 |
bauzas | thinking it more, the real problem is the fact that Manila exposes the export_location on the API for any user | 14:38 |
dansmith | well, for standalone usage it probably has to | 14:38 |
bauzas | sean-k-mooney: even with certs, the compute IP address would be shown in the export_location field of the manila share | 14:38 |
bauzas | tryue | 14:38 |
dansmith | however, I wonder if it should always only show a user their *own* exports | 14:38 |
bauzas | that's absolutely understandable from a standalone perspective | 14:38 |
sean-k-mooney | im not sure about that | 14:38 |
sean-k-mooney | but im in another meeting | 14:38 |
dansmith | that way, if nova uses its own user to create the export, the user would not be able to see it | 14:39 |
sean-k-mooney | so ill check back with this after | 14:39 |
bauzas | dansmith: another user, another project, but yeah | 14:41 |
bauzas | dansmith: it looks to me the granularity on CRUDs on per project | 14:42 |
dansmith | bauzas: well, either really | 14:42 |
dansmith | bauzas: we have per-user resources | 14:42 |
dansmith | but we need something | 14:42 |
dansmith | project would mean that the neutron user could see exports created by the nova user, which is probably not ever necessary | 14:42 |
dansmith | also not terrible, but.. | 14:42 |
bauzas | dansmith: well, we tested with Uggla, a devstack admin can create a share and add an access ACL, a devstack demo user can both delete the ACL and the share | 14:42 |
dansmith | wait, what? | 14:43 |
bauzas | that yeah | 14:43 |
dansmith | but why? | 14:43 |
bauzas | I'm saying again | 14:43 |
bauzas | f... no idea :) | 14:43 |
dansmith | that's fundamentally broken.. more than just "too coarse" right? | 14:43 |
bauzas | I'm devstack admin, I create a share and I add access to 1.2.3.4/32 | 14:44 |
bauzas | then, | 14:44 |
bauzas | I switch to demo | 14:44 |
dansmith | is manila in devstack configured with noauth or something/ | 14:44 |
bauzas | as demo, I can read that there is an existing share, I can read its ACL | 14:44 |
bauzas | and I can delete both | 14:44 |
dansmith | if that's true, this should have been a security bug no? | 14:44 |
bauzas | dansmith: checking, but I think admin and demo share the same project | 14:45 |
dansmith | oh, then that explains it, but.. they shouldn't be right? | 14:45 |
dansmith | isn't admin in the admin project and demo in the demo project? | 14:45 |
bauzas | dansmith: I'm just verifying that with Uggla | 14:45 |
bauzas | dansmith: double-checking but yeah | 14:46 |
bauzas | so, we tested with both an admin and a demo user from the same project | 14:53 |
opendevreview | Merged openstack/placement master: Move implemented specs for Xena and Yoga release https://review.opendev.org/c/openstack/placement/+/853730 | 15:11 |
opendevreview | Stephen Finucane proposed openstack/nova master: db: Don't rely on branched connections https://review.opendev.org/c/openstack/nova/+/880663 | 16:07 |
opendevreview | Stephen Finucane proposed openstack/nova master: db: Remove unnecessary 'insert()' argument https://review.opendev.org/c/openstack/nova/+/880664 | 16:07 |
opendevreview | Stephen Finucane proposed openstack/nova master: tests: Pass parameters to sqlalchemy.text() as bindparams https://review.opendev.org/c/openstack/nova/+/880665 | 16:07 |
opendevreview | Stephen Finucane proposed openstack/nova master: fixup! db: Remove unnecessary 'insert()' argument https://review.opendev.org/c/openstack/nova/+/880666 | 16:07 |
opendevreview | Stephen Finucane proposed openstack/nova master: tests: Add missing args to sqlalchemy.Table https://review.opendev.org/c/openstack/nova/+/880667 | 16:07 |
opendevreview | Stephen Finucane proposed openstack/nova master: db: Avoid relying on autobegin https://review.opendev.org/c/openstack/nova/+/880668 | 16:07 |
opendevreview | Stephen Finucane proposed openstack/nova master: tests: Remove test for DB special characters https://review.opendev.org/c/openstack/nova/+/880669 | 16:07 |
opendevreview | Stephen Finucane proposed openstack/nova master: db: Remove unnecessary 'insert()' argument https://review.opendev.org/c/openstack/nova/+/880664 | 16:08 |
opendevreview | Stephen Finucane proposed openstack/nova master: tests: Pass parameters to sqlalchemy.text() as bindparams https://review.opendev.org/c/openstack/nova/+/880665 | 16:08 |
opendevreview | Stephen Finucane proposed openstack/nova master: tests: Add missing args to sqlalchemy.Table https://review.opendev.org/c/openstack/nova/+/880667 | 16:08 |
opendevreview | Stephen Finucane proposed openstack/nova master: db: Avoid relying on autobegin https://review.opendev.org/c/openstack/nova/+/880668 | 16:08 |
opendevreview | Stephen Finucane proposed openstack/nova master: tests: Remove test for DB special characters https://review.opendev.org/c/openstack/nova/+/880669 | 16:08 |
opendevreview | Merged openstack/placement master: Update python testing as per zed cycle testing runtime https://review.opendev.org/c/openstack/placement/+/841690 | 16:41 |
opendevreview | Dan Smith proposed openstack/nova master: Remove silent failure to find a node on rebuild https://review.opendev.org/c/openstack/nova/+/880632 | 17:11 |
opendevreview | Dan Smith proposed openstack/nova master: Stop ignoring missing compute nodes in claims https://review.opendev.org/c/openstack/nova/+/880633 | 17:11 |
dansmith | gibi: sean-k-mooney: bauzas: re our conversation this morning, I've seen this on plenty of volume tests lately as well: https://04e1a81b9004b13f1732-58d59663991bfd1d0dc3ba3167f83a98.ssl.cf1.rackcdn.com/880632/2/check/nova-ceph-multistore/f6ebaaf/testr_results.html | 19:30 |
dansmith | i.e. several kernel crashes and, obviously, a failure to detach later | 19:30 |
dansmith | it's this kind of thing that makes me highly suspect of cirros | 19:31 |
sean-k-mooney | dansmith: if its the old cirror image form the 5.2 release | 22:15 |
sean-k-mooney | actully thts a differnt crash then i have seen in the past | 22:16 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!