Global Lustre
Background
Adding global lustre to rabbit systems allows access to external file systems. This is primarily
used for Data Movement, where a user can perform copy_in
and copy_out
directives with global
lustre being the source and destination, respectively.
Global lustre fileystems are represented by the lustrefilesystems
resource in Kubernetes:
$ kubectl get lustrefilesystems -A
NAMESPACE NAME FSNAME MGSNIDS AGE
default mylustre mylustre 10.1.1.113@tcp 20d
An example resource is as follows:
apiVersion: lus.cray.hpe.com/v1beta1
kind: LustreFileSystem
metadata:
name: mylustre
namespace: default
spec:
mgsNids: 10.1.1.100@tcp
mountRoot: /p/mylustre
name: mylustre
namespaces:
default:
modes:
- ReadWriteMany
Namespaces
Note the spec.namespaces
field. For each namespace listed, the lustre-fs-operator
creates a
PV/PVC pair in that namespace. This allows pods in that namespace to access global lustre. The
default
namespace should appear in this list. This makes the lustrefilesystem
resource available
to the default
namespace, which makes it available to containers (e.g. container workflows)
running in the default
namespace.
The nnf-dm-system
namespace is added automatically - no need to specify that manually here. The
NNF Data Movement Manager is responsible for ensuring that the nnf-dm-system
is in
spec.namespaces
. This is to ensure that the NNF DM Worker pods have global lustre mounted as long
as nnf-dm
is deployed. To unmount global lustre from the NNF DM Worker pods, the
lustrefilesystem
resource must be deleted.
The lustrefilesystem
resource itself should be created in the default
namespace (i.e.
metadata.namespace
).
NNF Data Movement Manager
The NNF Data Movement Manager is responsible for monitoring lustrefilesystem
resources to mount
(or umount) the global lustre filesystem in each of the NNF DM Worker pods. These pods run on each
of the NNF nodes. This means with each addition or removal of lustrefilesystems
resources, the DM
worker pods restart to adjust their mount points.
The NNF Data Movement Manager also places a finalizer on the lustrefilesystem
resource to indicate
that the resource is in use by Data Movement. This is to prevent the PV/PVC being deleted while they
are being used by pods.
Adding Global Lustre
As mentioned previously, the NNF Data Movement Manager monitors these resources and automatically
adds the nnf-dm-system
namespace to all lustrefilesystem
resources. Once this happens, a PV/PVC
is created for the nnf-dm-system
namespace to access global lustre. The Manager updates the NNF DM
Worker pods, which are then restarted to mount the global lustre file system.
Removing Global Lustre
When a lustrefilesystem
is deleted, the NNF DM Manager takes notice and starts to unmount the file
system from the DM Worker pods - causing another restart of the DM Worker pods. Once this is
finished, the DM finalizer is removed from the lustrefilesystem
resource to signal that it is no
longer in use by Data Movement.
If a lustrefilesystem
does not delete, check the finalizers to see what might still be using it.
It is possible to get into a situation where nnf-dm
has been undeployed, so there is nothing to
remove the DM finalizer from the lustrefilesystem
resource. If that is the case, then manually
remove the DM finalizer so the deletion of the lustrefilesystem
resource can continue.