System Storage
Background
System storage allows an admin to configure Rabbit storage without a DWS workflow. This is useful for making storage that is outside the scope of any job. One use case for system storage is to create a pair of LVM VGs on the Rabbit nodes that can be used to work around an lvmlockd
bug. The lockspace for the VGs can be started on the compute nodes, holding the lvm_global
lock open while other Rabbit VG lockspaces are started and stopped.
NnfSystemStorage Resource
System storage is created through the NnfSystemStorage
resource. By default, system storage creates an allocation on all Rabbits in the system and exposes the storage to all computes. This behavior can be modified through different fields in the NnfSystemStorage
resource. A NnfSystemStorage
storage resource has the following fields in its Spec
section:
Field | Required | Default | Value | Notes |
---|---|---|---|---|
SystemConfiguration |
No | Empty | ObjectReference to the SystemConfiguration to use |
By default, the default /default SystemConfiguration is used |
IncludeRabbits |
No | Empty | A list of Rabbit node names | Rather than use all the Rabbits in the SystemConfiguration , only use the Rabbits contained in this list |
ExcludeRabbits |
No | Empty | A list of Rabbit node names | Use all the Rabbits in the SystemConfiguration except those contained in this list. |
IncludeComputes |
No | Empty | A list of compute node names | Rather than use the SystemConfiguration to determine which computes are attached to the Rabbit nodes being used, only use the compute nodes contained in this list |
ExcludeComputes |
No | Empty | A list of compute node names | Use the SystemConfiguration to determine which computes are attached to the Rabbits being used, but omit the computes contained in this list |
ComputesTarget |
Yes | all |
all ,even ,odd ,pattern |
Only use certain compute nodes based on their index as determined from the SystemConfiguration . all uses all computes. even uses computes with an even index. odd uses computes with an odd index. pattern uses computes with the indexes specified in Spec.ComputesPattern |
ComputesPattern |
No | Empty | A list of integers [0-15] | If ComputesTarget is pattern , then the storage is made available on compute nodes with the indexes specified in this list. |
Capacity |
Yes | 1073741824 |
Integer | Number of bytes to allocate per Rabbit |
Type |
Yes | raw |
raw , xfs , gfs2 |
Type of file system to create on the Rabbit storage |
StorageProfile |
Yes | None | ObjectReference to an NnfStorageProfile |
This storage profile must be marked as pinned |
MakeClientMounts |
Yes | false |
Bool | Create ClientMount resources to mount the storage on the compute nodes. If this is false , then the devices are made available to the compute nodes without mounting the file system |
ClientMountPath |
No | None | Path | Path to mount the file system on the compute nodes |
NnfSystemResources
can be created in any namespace.
Example
apiVersion: nnf.cray.hpe.com/v1alpha1
kind: NnfSystemStorage
metadata:
name: gfs2-systemstorage
namespace: systemstorage
spec:
excludeRabbits:
- "rabbit-1"
- "rabbit-9"
- "rabbit-14"
excludeComputes:
- "compute-32"
- "compute-49"
type: "gfs2"
capacity: 10000000000
computesTarget: "pattern"
computesPattern:
- 0
- 1
- 2
- 3
- 4
- 5
- 6
- 7
makeClientMounts: true
clientMountPath: "/mnt/nnf/gfs2"
storageProfile:
name: gfs2-systemstorage
namespace: default
kind: NnfStorageProfile
lvmlockd Workaround
System storage can be used to workaround an lvmlockd
bug that occurs when trying to start the lvm_global
lockspace. The lvm_global
lockspace is started only when there is a volume group lockspace that is started. After the last volume group lockspace is stopped, then the lvm_global
lockspace is stopped as well. To prevent the lvm_global
lockspace from being started and stopped so often, a volume group is created on the Rabbits and shared with the computes. The compute nodes can start the volume group lockspace and leave it open.
The system storage can also be used to check whether the PCIe cables are attached correctly between the Rabbit and compute nodes. If the cables are incorrect, then the PCIe switch will make NVMe namespaces available to the wrong compute node. An incorrect cable can only result in compute nodes that have PCIe connections switched with the other compute node in its pair. By creating two system storages, one for compute nodes with an even index, and one for compute nodes with an odd index, the PCIe connection can be verified by checking that the correct system storage is visible on a compute node.
Example
The following example resources show how to create two system storages to use for the lvmlockd
workaround. Each system storage creates a raw
allocation with a volume group but no logical volume. This is the minimum LVM set up needed to start a lockspace on the compute nodes. A NnfStorageProfile
is created for each of the system storages. The NnfStorageProfile
specifies a tag during the vgcreate
that is used to differentiate between the two VGs. These resources are created in the systemstorage
namespace, but they could be created in any namespace.
apiVersion: nnf.cray.hpe.com/v1alpha1
kind: NnfStorageProfile
metadata:
name: lvmlockd-even
namespace: default
data:
xfsStorage:
capacityScalingFactor: "1.0"
lustreStorage:
capacityScalingFactor: "1.0"
gfs2Storage:
capacityScalingFactor: "1.0"
default: false
pinned: true
rawStorage:
capacityScalingFactor: "1.0"
commandlines:
pvCreate: $DEVICE
pvRemove: $DEVICE
sharedVg: true
vgChange:
lockStart: --lock-start $VG_NAME
lockStop: --lock-stop $VG_NAME
vgCreate: --shared --addtag lvmlockd-even $VG_NAME $DEVICE_LIST
vgRemove: $VG_NAME
---
apiVersion: nnf.cray.hpe.com/v1alpha1
kind: NnfStorageProfile
metadata:
name: lvmlockd-odd
namespace: default
data:
xfsStorage:
capacityScalingFactor: "1.0"
lustreStorage:
capacityScalingFactor: "1.0"
gfs2Storage:
capacityScalingFactor: "1.0"
default: false
pinned: true
rawStorage:
capacityScalingFactor: "1.0"
commandlines:
pvCreate: $DEVICE
pvRemove: $DEVICE
sharedVg: true
vgChange:
lockStart: --lock-start $VG_NAME
lockStop: --lock-stop $VG_NAME
vgCreate: --shared --addtag lvmlockd-odd $VG_NAME $DEVICE_LIST
vgRemove: $VG_NAME
Note that the NnfStorageProfile
resources are marked as default: false
and pinned: true
. This is required for NnfStorageProfiles
that are used for system storage. The commandLine
fields for LV commands are left empty so that no LV is created.
apiVersion: nnf.cray.hpe.com/v1alpha1
kind: NnfSystemStorage
metadata:
name: lvmlockd-even
namespace: systemstorage
spec:
type: "raw"
computesTarget: "even"
makeClientMounts: false
storageProfile:
name: lvmlockd-even
namespace: default
kind: NnfStorageProfile
---
apiVersion: nnf.cray.hpe.com/v1alpha1
kind: NnfSystemStorage
metadata:
name: lvmlockd-odd
namespace: systemstorage
spec:
type: "raw"
computesTarget: "odd"
makeClientMounts: false
storageProfile:
name: lvmlockd-odd
namespace: default
kind: NnfStorageProfile
The two NnfSystemStorage
resources each target all of the Rabbits but a different set of compute nodes. This will result in each Rabbit having two VGs and each compute node having one VG.
After the NnfSystemStorage
resources are created, the Rabbit software will create the storage on the Rabbit nodes and make the LVM VG available to the correct compute nodes. At this point, the status.ready
field will be true
. If an error occurs, the .status.error
field will describe the error.