Rabbit storage for containerized applications¶
For Rabbit to provide storage to a containerized application there needs to be some mechanism. The remainder of this RFC proposes that mechanism.
Actors¶
There are several different actors involved
- The AUTHOR of the containerized application
- The ADMINISTRATOR who works with the author to determine the application requirements for execution
- The USER who intends to to use the application using the 'container' directive in their job specification
- The RABBIT software that interprets the #DWs and starts the container during execution of the job
There are multiple relationships between the actors
- AUTHOR to ADMINISTRATOR: The author tells the administrator how their application is executed and the NNF storage requirements.
- Between the AUTHOR and USER: The application expects certain storage, and the #DW must meet those expectations.
- ADMINISTRATOR to RABBIT: Admin tells Rabbit how to run the containerized application with the required storage.
- Between USER and RABBIT: User provides the #DW container directive in the job specification. Rabbit validates and interprets the directive.
Proposal¶
The proposal below might take a couple of read-throughs; I've also added a concrete example afterward that might help.
- The AUTHOR writes their application expecting NNF Storage at specific locations. For each storage requirement, they define:
- a unique name for the storage which can be referenced in the 'container' directive
- the expected storage types; if necessary
- the required mount path or mount path prefix
- other constraints or storage requirements (e.g. minimum capacity)
- The AUTHOR works with the ADMINISTRATOR to define:
- a unique name for the program to be referred by USER
- the pod template specification for executing their program
- the NNF storage requirements described above.
- The ADMINISTRATOR creates a corresponding NNF Container Profile custom kubernetes resource with the necessary NNF storage requirements and pod specification as described by the AUTHOR
- The USER who desires to use the application works with the AUTHOR and the related NNF Container Profile to understand the storage requirements.
- The USER submits a WLM job with the #DW container fields populated
- WLM runs the job and drives the job through the following stages...
- Proposal: RABBIT validates the #DW container directive by comparing the supplied values to what is listed in the NNF Container Profile. If the USER fails to meet the requirements, the job fails.
- Pre-run: RABBIT software will:
- create a config map reflecting the storage requirements and any runtime parameters; this is provided to the container at the volume mount named "nnf-config", if specified.
- duplicate the pod template specification from the Container Profile and patches the necessary Volumes and the config map. The spec is used as the basis for starting the necessary pods and containers.
- The containerized application executes. The expected mounts are available per the requirements and celebration occurs.
Example¶
Say I authored a simple application, foo
, that requires Rabbit local GFS2 storage and a persistent Lustre storage volume. As the author, my program is coded to expect the GFS2 volume is mounted at /foo/local
and the Lustre volume is mounted at /foo/persistent
. In this case, the storages are not optional, so they are defined as such in the NNF Container Profile.
Working with an administrator, my application's storage requirements and pod specification are placed in an NNF Container Profile foo
:
kind: NnfContainerProfile
apiVersion: v1alpha1
metadata:
name: foo
namespace: default
spec:
storages:
- name: JOB_DW_foo-local-storage
optional: false
- name: PERSISTENT_DW_foo-persistent-storage
optional: false
template:
metadata:
name: foo
namespace: default
spec:
containers:
- name: foo
image: foo:latest
command:
- /foo
volumeMounts:
- name: foo-local-storage
mountPath: /foo/local
- name: foo-persistent-storage
mountPath: /foo/persistent
- name: nnf-config
mountPath: /nnf/config
Say Peter wants to use foo
as part of his job specification. Peter would submit the job with the directives below:
#DW jobdw name=my-gfs2 type=gfs2 capacity=1TB
#DW persistentdw name=some-lustre
#DW container name=my-foo profile=foo \
JOB_DW_foo-local-storage=my-gfs2 \
PERSISTENT_DW_foo-persistent-storage=some-lustre
Since the NNF Container Profile has specified that both storages are not optional (i.e. optional: false
), they must both be present in the #DW directives along with the container
directive. Alternatively, if either was marked as optional (i.e. optional: true
), it would not be required to be present in the #DW directives and therefore would not be mounted into the container.
Peter submits the job to the WLM. WLM guides the job through the workflow states:
- Proposal: Rabbit software verifies the #DW directives. For the container directive
my-foo
with profilefoo
, the storage requirements listed in the NNF Container Profile arefoo-local-storage
andfoo-persistent-storage
. These values are correctly represented by the directive so it is valid. - Setup: Since there is a jobdw,
my-gfs2
, Rabbit software provisions this storage. - Pre-Run:
- Rabbit software generates a config map that corresponds to the storage requirements and runtime parameters.
- Rabbit software duplicates the
foo
pod template spec in the NNF Container Profile and fills in the necessary volumes and config map.kind: Pod apiVersion: v1 metadata: name: my-job-container-my-foo template: metadata: name: foo namespace: default spec: containers: # This section unchanged from Container Profile - name: foo image: foo:latest command: - /foo volumeMounts: - name: foo-local-storage mountPath: /foo/local - name: foo-persistent-storage mountPath: /foo/persistent - name: nnf-config mountPath: /nnf/config # volumes added by Rabbit software volumes: - name: foo-local-storage hostPath: path: /nnf/job/my-job/my-gfs2 - name: foo-persistent-storage hostPath: path: /nnf/persistent/some-lustre - name: nnf-config configMap: name: my-job-container-my-foo # securityContext added by Rabbit software - values will be inherited from the workflow securityContext: runAsUser: 1000 runAsGroup: 2000 fsGroup: 2000
- Rabbit software starts the pods on Rabbit nodes
Security¶
Kubernetes allows for a way to define permissions for a container using a Security Context. This can be seen in the pod template spec above. The user and group IDs will be inherited from the Workflow's spec.
Special Note: Indexed-Mount Type¶
When using a file system like XFS or GFS2, each compute is allocated its own Rabbit volume. The Rabbit software mounts a collection of mount paths with a common prefix and an ending indexed value.
Application AUTHORS must be aware that their desired mount-point really contains a collection of directories, one for each compute node. The mount point type can be known by consulting the config map values.
If we continue the example from above, the foo
application would expect the foo-local-storage path of /foo/local
to contain several directories
Node positions are not absolute locations. WLM could, in theory, select 6 physical compute nodes at physical location 1, 2, 3, 5, 8, 13, which would appear as directories /node-0
through /node-5
in the container path.
Symlinks will be added to support the physical compute node names. Assuming a compute node hostname of compute-node-1
from the example above, it would link to node-0
, compute-node-2
would link to node-1
, etc.
Additionally, not all container instances could see the same number of compute nodes in an indexed-mount scenario. If 17 compute nodes are required for the job, WLM may assign 16 nodes to run one Rabbit, and 1 node to another Rabbit.