Skip to content

Latest commit

 

History

History
373 lines (251 loc) · 20.1 KB

03.18-Managing Security in a Lustre File System.md

File metadata and controls

373 lines (251 loc) · 20.1 KB

Managing Security in a Lustre File System

This chapter describes security features of the Lustre file system and includes the following sections:

Using ACLs

An access control list (ACL), is a set of data that informs an operating system about permissions or access rights that each user or group has to specific system objects, such as directories or files. Each object has a unique security attribute that identifies users who have access to it. The ACL lists each object and user access privileges such as read, write or execute.

How ACLs Work

Implementing ACLs varies between operating systems. Systems that support the Portable Operating System Interface (POSIX) family of standards share a simple yet powerful file system permission model, which should be well-known to the Linux/UNIX administrator. ACLs add finer-grained permissions to this model, allowing for more complicated permission schemes. For a detailed explanation of ACLs on a Linux operating system, refer to the SUSE Labs article Posix Access Control Lists on Linux.

We have implemented ACLs according to this model. The Lustre software works with the standard Linux ACL tools, setfacl, getfacl, and the historical chacl, normally installed with the ACL package.

Note

ACL support is a system-range feature, meaning that all clients have ACL enabled or not. You cannot specify which clients should enable ACL.

Using ACLs with the Lustre Software

POSIX Access Control Lists (ACLs) can be used with the Lustre software. An ACL consists of file entries representing permissions based on standard POSIX file system object permissions that define three classes of user (owner, group and other). Each class is associated with a set of permissions [read (r), write (w) and execute (x)].

  • Owner class permissions define access privileges of the file owner.
  • Group class permissions define access privileges of the owning group.
  • Other class permissions define access privileges of all users not in the owner or group class.

The ls -l command displays the owner, group, and other class permissions in the first column of its output (for example, -rw-r- -- for a regular file with read and write access for the owner class, read access for the group class, and no access for others).

Minimal ACLs have three entries. Extended ACLs have more than the three entries. Extended ACLs also contain a mask entry and may contain any number of named user and named group entries.

The MDS needs to be configured to enable ACLs. Use --mountfsoptions to enable ACLs when creating your configuration:

$ mkfs.lustre --fsname spfs --mountfsoptions=acl --mdt -mgs /dev/sda

Alternately, you can enable ACLs at run time by using the --acl option with mkfs.lustre:

$ mount -t lustre -o acl /dev/sda /mnt/mdt

To check ACLs on the MDS:

$ lctl get_param -n mdc.home-MDT0000-mdc-*.connect_flags | grep acl acl

To mount the client with no ACLs:

$ mount -t lustre -o noacl ibmds2@o2ib:/home /home

ACLs are enabled in a Lustre file system on a system-wide basis; either all clients enable ACLs or none do. Activating ACLs is controlled by MDS mount options acl / noacl (enable/disable ACLs). Client-side mount options acl/noacl are ignored. You do not need to change the client configuration, and the 'acl' string will not appear in the client /etc/mtab. The client acl mount option is no longer needed. If a client is mounted with that option, then this message appears in the MDS syslog:

...MDS requires ACL support but client does not

The message is harmless but indicates a configuration issue, which should be corrected.

If ACLs are not enabled on the MDS, then any attempts to reference an ACL on a client return an Operation not supported error.

Examples

These examples are taken directly from the POSIX paper referenced above. ACLs on a Lustre file system work exactly like ACLs on any Linux file system. They are manipulated with the standard tools in the standard manner. Below, we create a directory and allow a specific user access.

[root@client lustre]# umask 027
[root@client lustre]# mkdir rain
[root@client lustre]# ls -ld rain
drwxr-x---  2 root root 4096 Feb 20 06:50 rain
[root@client lustre]# getfacl rain
# file: rain
# owner: root
# group: root
user::rwx
group::r-x
other::---
 
[root@client lustre]# setfacl -m user:chirag:rwx rain
[root@client lustre]# ls -ld rain
drwxrwx---+ 2 root root 4096 Feb 20 06:50 rain
[root@client lustre]# getfacl --omit-header rain
user::rwx
user:chirag:rwx
group::r-x
mask::rwx
other::---

Using Root Squash

Root squash is a security feature which restricts super-user access rights to a Lustre file system. Without the root squash feature enabled, Lustre file system users on untrusted clients could access or modify files owned by root on the file system, including deleting them. Using the root squash feature restricts file access/modifications as the root user to only the specified clients. Note, however, that this does not prevent users on insecure clients from accessing files owned by other users.

The root squash feature works by re-mapping the user ID (UID) and group ID (GID) of the root user to a UID and GID specified by the system administrator, via the Lustre configuration management server (MGS). The root squash feature also enables the Lustre file system administrator to specify a set of client for which UID/GID re-mapping does not apply.

Note

Nodemaps (Mapping UIDs and GIDs with Nodemap) are an alternative to root squash, since it also allows root squash on a per-client basis. With UID maps, the clients can even have a local root UID without actually having root access to the filesystem itself.

Configuring Root Squash

Root squash functionality is managed by two configuration parameters, root_squash and nosquash_nids.

  • The root_squash parameter specifies the UID and GID with which the root user accesses the Lustre file system.
  • The nosquash_nids parameter specifies the set of clients to which root squash does not apply. LNet NID range syntax is used for this parameter (see the NID range syntax rules described in the section called “Using Root Squash”). For example:
nosquash_nids=172.16.245.[0-255/2]@tcp

In this example, root squash does not apply to TCP clients on subnet 172.16.245.0 that have an even number as the last component of their IP address.

Enabling and Tuning Root Squash

The default value for nosquash_nids is NULL, which means that root squashing applies to all clients. Setting the root squash UID and GID to 0 turns root squash off.

Root squash parameters can be set when the MDT is created (mkfs.lustre --mdt). For example:

mds# mkfs.lustre --reformat --fsname=testfs --mdt --mgs \
       --param "mdt.root_squash=500:501" \
       --param "mdt.nosquash_nids='0@elan1 192.168.1.[10,11]'" /dev/sda1

Root squash parameters can also be changed on an unmounted device with tunefs.lustre. For example:

tunefs.lustre --param "mdt.root_squash=65534:65534"  \
--param "mdt.nosquash_nids=192.168.0.13@tcp0" /dev/sda1

Root squash parameters can also be changed with the lctl conf_param command. For example:

mgs# lctl conf_param testfs.mdt.root_squash="1000:101"
mgs# lctl conf_param testfs.mdt.nosquash_nids="*@tcp"

To retrieve the current root squash parameter settings, the following lctl get_param commands can be used:

mgs# lctl get_param mdt.*.root_squash
mgs# lctl get_param mdt.*.nosquash_nids

Note

When using the lctl conf_param command, keep in mind:

  • lctl conf_param must be run on a live MGS
  • lctl conf_param causes the parameter to change on all MDSs
  • lctl conf_param is to be used once per a parameter

The root squash settings can also be changed temporarily with lctl set_param or persistently with lctl set_param -P. For example:

mgs# lctl set_param mdt.testfs-MDT0000.root_squash="1:0"
mgs# lctl set_param -P mdt.testfs-MDT0000.root_squash="1:0"

The nosquash_nids list can be cleared with:

mgs# lctl conf_param testfs.mdt.nosquash_nids="NONE"

- OR -

mgs# lctl conf_param testfs.mdt.nosquash_nids="clear"

If the nosquash_nids value consists of several NID ranges (e.g. 0@elan, 1@elan1), the list of NID ranges must be quoted with single (') or double ('') quotation marks. List elements must be separated with a space. For example:

mds# mkfs.lustre ... --param "mdt.nosquash_nids='0@elan1 1@elan2'" /dev/sda1
lctl conf_param testfs.mdt.nosquash_nids="24@elan 15@elan1"

These are examples of incorrect syntax:

mds# mkfs.lustre ... --param "mdt.nosquash_nids=0@elan1 1@elan2" /dev/sda1
lctl conf_param testfs.mdt.nosquash_nids=24@elan 15@elan1

To check root squash parameters, use the lctl get_param command:

mds# lctl get_param mdt.testfs-MDT0000.root_squash
lctl get_param mdt.*.nosquash_nids

Note

An empty nosquash_nids list is reported as NONE.

Tips on Using Root Squash

Lustre configuration management limits root squash in several ways.

  • The lctl conf_param value overwrites the parameter's previous value. If the new value uses an incorrect syntax, then the system continues with the old parameters and the previously-correct value is lost on remount. That is, be careful doing root squash tuning.
  • mkfs.lustre and tunefs.lustre do not perform parameter syntax checking. If the root squash parameters are incorrect, they are ignored on mount and the default values are used instead.
  • Root squash parameters are parsed with rigorous syntax checking. The root_squash parameter should be specified as <decnum>:<decnum>. The nosquash_nids parameter should follow LNet NID range list syntax.

LNet NID range syntax:

<nidlist>     :== <nidrange> [ ' ' <nidrange> ]
<nidrange>   :== <addrrange> '@' <net>
<addrrange>  :== '*' |
           <ipaddr_range> |
           <numaddr_range>
<ipaddr_range>       :==
<numaddr_range>.<numaddr_range>.<numaddr_range>.<numaddr_range>
<numaddr_range>      :== <number> |
                   <expr_list>
<expr_list>  :== '[' <range_expr> [ ',' <range_expr>] ']'
<range_expr> :== <number> |
           <number> '-' <number> |
           <number> '-' <number> '/' <number>
<net>        :== <netname> | <netname><number>
<netname>    :== "lo" | "tcp" | "o2ib"
           | "ra" | "elan"
<number>     :== <nonnegative decimal> | <hexadecimal>

Note

For networks using numeric addresses (e.g. elan), the address range must be specified in the<numaddr_range> syntax. For networks using IP addresses, the address range must be in the<ipaddr_range>. For example, if elan is using numeric addresses, 1.2.3.4@elan is incorrect.

Isolating Clients to a Sub-directory Tree

Isolation is the Lustre implementation of the generic concept of multi-tenancy, which aims at providing separated namespaces from a single filesystem. Lustre Isolation enables different populations of users on the same file system beyond normal Unix permissions/ACLs, even when users on the clients may have root access. Those tenants share the same file system, but they are isolated from each other: they cannot access or even see each other’s files, and are not aware that they are sharing common file system resources.

Lustre Isolation leverages the Fileset feature (the section called “Fileset Feature”) to mount only a subdirectory of the filesystem rather than the root directory. In order to achieve isolation, the subdirectory mount, which presents to tenants only their own fileset, has to be imposed to the clients. To that extent, we make use of the nodemap feature (Mapping UIDs and GIDs with Nodemap). We group all clients used by a tenant under a common nodemap entry, and we assign to this nodemap entry the fileset to which the tenant is restricted.

Identifying Clients

Enforcing multi-tenancy on Lustre relies on the ability to properly identify the client nodes used by a tenant, and trust those identities. This can be achieved by having physical hardware and/or network security, so that client nodes have well-known NIDs. It is also possible to make use of strong authentication with Kerberos or Shared-Secret Key (see Configuring Shared-Secret Key (SSK) Security). Kerberos prevents NID spoofing, as every client needs its own credentials, based on its NID, in order to connect to the servers. Shared-Secret Key also prevents tenant impersonation, because keys can be linked to a specific nodemap. See the section called “Role of Nodemap in SSK” for detailed explanations.

Configuring Isolation

Isolation on Lustre can be achieved by setting the fileset parameter on a nodemap entry. All clients belonging to this nodemap entry will automatically mount this fileset instead of the root directory. For example:

mgs# lctl nodemap_set_fileset --name tenant1 --fileset '/dir1'

So all clients matching the tenant1 nodemap will be automatically presented the fileset /dir1 when mounting. This means these clients are doing an implicit subdirectory mount on the subdirectory /dir1.

Note

If subdirectory defined as fileset does not exist on the file system, it will prevent any client belonging to the nodemap from mounting Lustre.

To delete the fileset parameter, just set it to an empty string:

mgs# lctl nodemap_set_fileset --name tenant1 --fileset ''

Making Isolation Permanent

In order to make isolation permanent, the fileset parameter on the nodemap has to be set with lctl set_paramwith the -P option.

mgs# lctl set_param nodemap.tenant1.fileset=/dir1
mgs# lctl set_param -P nodemap.tenant1.fileset=/dir1

This way the fileset parameter will be stored in the Lustre config logs, letting the servers retrieve the information after a restart.

Introduced in Lustre 2.13

Checking SELinux Policy Enforced by Lustre Clients

SELinux provides a mechanism in Linux for supporting Mandatory Access Control (MAC) policies. When a MAC policy is enforced, the operating system’s (OS) kernel defines application rights, firewalling applications from compromising the entire system. Regular users do not have the ability to override the policy.

One purpose of SELinux is to protect the OS from privilege escalation. To that extent, SELinux defines confined and unconfined domains for processes and users. Each process, user, file is assigned a security context, and rules define the allowed operations by processes and users on files.

Another purpose of SELinux can be to protect data sensitivity, thanks to Multi-Level Security (MLS). MLS works on top of SELinux, by defining the concept of security levels in addition to domains. Each process, user and file is assigned a security level, and the model states that processes and users can read the same or lower security level, but can only write to their own or higher security level.

From a file system perspective, the security context of files must be stored permanently. Lustre makes use of thesecurity.selinux extended attributes on files to hold this information. Lustre supports SELinux on the client side. All you have to do to have MAC and MLS on Lustre is to enforce the appropriate SELinux policy (as provided by the Linux distribution) on all Lustre clients. No SELinux is required on Lustre servers.

Because Lustre is a distributed file system, the specificity when using MLS is that Lustre really needs to make sure data is always accessed by nodes with the SELinux MLS policy properly enforced. Otherwise, data is not protected. This means Lustre has to check that SELinux is properly enforced on client side, with the right, unaltered policy. And if SELinux is not enforced as expected on a client, the server denies its access to Lustre.

Determining SELinux Policy Info

A string that represents the SELinux Status info will be used by servers as a reference, to check if clients are enforcing SELinux properly. This reference string can be obtained on a client node known to enforce the right SELinux policy, by calling the l_getsepol command line utility:

client# l_getsepol
SELinux status info: 1:mls:31:40afb76d077c441b69af58cccaaa2ca63641ed6e21b0a887dc21a684f508b78f

The string describing the SELinux policy has the following syntax:

mode:name:version:hash

where:

  • mode is a digit telling if SELinux is in Permissive mode (0) or Enforcing mode (1)
  • name is the name of the SELinux policy
  • version is the version of the SELinux policy
  • hash is the computed hash of the binary representation of the policy, as exported in /etc/selinux/name/policy/policy. version

Enforcing SELinux Policy Check

SELinux policy check can be enforced by setting the sepol parameter on a nodemap entry. All clients belonging to this nodemap entry must enforce the SELinux policy described by this parameter, otherwise they are denied access to the Lustre file system. For example:

mgs# lctl nodemap_set_sepol --name restricted
     --sepol '1:mls:31:40afb76d077c441b69af58cccaaa2ca63641ed6e21b0a887dc21a684f508b78f'

So all clients matching the restricted nodemap must enforce the SELinux policy which description matches1:mls:31:40afb76d077c441b69af58cccaaa2ca63641ed6e21b0a887dc21a684f508b78f. If not, they will get Permission Denied when trying to mount or access files on the Lustre file system.

To delete the sepol parameter, just set it to an empty string:

mgs# lctl nodemap_set_sepol --name restricted --sepol ''

See Mapping UIDs and GIDs with Nodemap for more details about the Nodemap feature.

Making SELinux Policy Check Permanent

In order to make SELinux Policy check permanent, the sepol parameter on the nodemap has to be set with lctl set_param with the -P option.

mgs# lctl set_param nodemap.restricted.sepol=1:mls:31:40afb76d077c441b69af58cccaaa2ca63641ed6e21b0a887dc21a684f508b78f
mgs# lctl set_param -P nodemap.restricted.sepol=1:mls:31:40afb76d077c441b69af58cccaaa2ca63641ed6e21b0a887dc21a684f508b78f

This way the sepol parameter will be stored in the Lustre config logs, letting the servers retrieve the information after a restart.

Sending SELinux Status Info from Clients

In order for Lustre clients to send their SELinux status information, in case SELinux is enabled locally, thesend_sepol ptlrpc kernel module's parameter has to be set to a non-zero value. send_sepol accepts various values:

  • 0: do not send SELinux policy info;
  • -1: fetch SELinux policy info for every request;
  • N > 0: only fetch SELinux policy info every N seconds. Use N = 2^31-1 to have SELinux policy info fetched only at mount time.

Clients that are part of a nodemap on which sepol is defined must send SELinux status info. And the SELinux policy they enforce must match the representation stored into the nodemap. Otherwise they will be denied access to the Lustre file system.