Linuxcluster: Hardware
Hardware configuration
The HPC cluster at TUHH-RZ consists of 241 compute nodes, several login nodes and a parallel storage system with a capacity of 350TB. All in all, about 6600 CPU cores, 32 TByte RAM and some GPUs are available for compute intensive workloads. Login nodes
The HPC cluster has several login nodes. Some login nodes may be temporarily unavailable due to maintenance.
If you do not have specific hardware or software requirements you are advised to use the alias hpclogin.rz.tuhh.de
.
Nodes | Cores | CPU Type | RAM | Recommended usage |
---|---|---|---|---|
hpc1.rz.tuhh.de | 2 | (virtual) | 4 GB | managing batch jobs, data transfer |
hpc4.rz.tuhh.de | 2× 10 | 2× E5-2660v3 | 128 GB | managing batch jobs, data transfer, building software, pre- and postprocessing, short test runs |
hpc5.rz.tuhh.de | 2× 10 | 2× E5-2660v3 | 128 GB | managing batch jobs, data transfer, building software, pre- and postprocessing, short test runs |
Compute nodes
Nodes | Cores | CPU Type | RAM | Comment |
---|---|---|---|---|
d[041-044] | 2× 8 | 2× E5-2670 | 64 GB | |
d[045-047] | 2× 10 | 2× E5-2670v2 | 64 GB | |
g[001-016,033-048,081-086] | 2× 12 | 2× E5-2680v3 | 128 GB | |
g[017-032,065-080] | 2× 12 | 2× E5-2680v3 | 256 GB | |
g[087-174,176-216] | 2× 14 | 2× E5-2680v4 | 128 GB | |
g[217-224] | 2× 16 | 2× Xeon Gold 6130 | 192 GB | |
g[225-228] | 2× 24 | 2× Xeon Gold 5318Y | 512 GB | |
u003 | 2× 6 | 2× E5-2620v3 | 64 GB | With four NVidia Tesla K80 cards (each 12GB Memory) |
u[004-006] | 2× 8 | 2× E5-2620v4 | 128 GB | With eight NVidia Tesla K80 cards (each 12GB Memory) |
u007 | 2× 26 | 2× Xeon Gold 6230R | 384 GB | With four NVidia Tesla V100 cards (each 32GB Memory) |
u[008-009] | 2× 36 | 2× Xeon Platinum 8352V | 512 GB | With four NVidia Tesla A100 cards (80GB Memory each) |
Software
- Operating system RedHat Enterprise Linux (RHEL) / CentOS 7 and 8
- Batch system SLURM
- Software management with environment modules.
Storage
- Home directory
- The home directory is mounted from central file servers of TUHH-RZ and is available in the Linux PC pools as well. The file system is backupped and snapshots are available.
- Standard quota is 10GB which can be increased on request.
- Slow storage for crucial data, not exceptionally useful for large scientific data sets.
- Local file systems
- Each node has local storage. Below /usertemp a personal subdirectory is created for each user like
/usertemp/<unix-group>/<username>, e.g./usertemp/rzt/rztkm . - The path /usertemp exists on all nodes but points to the local storage. Each node can only access its own /usertemp.
- Data below /usertemp are not backupped are subject to deletion after 14 days inactivity or after a reboot of the node.
- Fast storage as a local working directory.
- A remote access on the local storage of the compute nodes is possible from the login nodes. It can be mounted on request
below /remut, e.g. for node g001:
ls -l /remut/g001
- Each node has local storage. Below /usertemp a personal subdirectory is created for each user like
- parallel BeeGFS network file system
- The HPC cluster is equipped with a parallel storage system (BeeGFS).
- Below /work a personal subdirectory /work/<unix-group>/<username> is created for each user, e.g. /work/rzt/rztkm.
- The parallel file system is intended for temporary data during the simulation. All data is subject to automatic deletion after 90 days of inactivity.
- Globally visible.
- Tradeoff between home directory (globally visible, secure, slow, small) and local storage (locally visible, fast).
- This storage class has no backup - no permanent storage of important data !