Linuxcluster: Hardware

Hardware configuration

The HPC cluster at TUHH-RZ consists of 241 compute nodes, several login nodes and a parallel storage system with a capacity of 350TB. All in all, about 6600 CPU cores, 32 TByte RAM and some GPUs are available for compute intensive workloads.

Login nodes

The HPC cluster has several login nodes. Some login nodes may be temporarily unavailable due to maintenance. If you do not have specific hardware or software requirements you are advised to use the alias hpclogin.rz.tuhh.de.
Nodes Cores CPU Type RAM Recommended usage
hpc1.rz.tuhh.de 2 (virtual) 4 GB managing batch jobs, data transfer
hpc4.rz.tuhh.de 2× 10 2× E5-2660v3 128 GB managing batch jobs, data transfer, building software,
pre- and postprocessing, short test runs
hpc5.rz.tuhh.de 2× 10 2× E5-2660v3 128 GB managing batch jobs, data transfer, building software,
pre- and postprocessing, short test runs

Compute nodes

Nodes Cores CPU Type RAM Comment
d[041-044] 2× 8 2× E5-2670 64 GB
d[045-047] 2× 10 2× E5-2670v2 64 GB
g[001-016,033-048,073-086] 2× 12 2× E5-2680v3 128 GB
g[017-032,049-064,067-072] 2× 12 2× E5-2680v3 256 GB
g[065-066] 2× 12 2× E5-2680v3 384 GB
g[087-174,176-216] 2× 14 2× E5-2680v4 128 GB
g[217-224] 2× 16 2× Xeon Gold 6130 192 GB
g[225-228] 2× 24 2× Xeon Gold 5318Y 512 GB
u003 2× 6 2× E5-2620v3 64 GB With four NVidia Tesla K80 cards
(each 12GB Memory)
u[004-006] 2× 8 2× E5-2620v4 128 GB With eight NVidia Tesla K80 cards
(each 12GB Memory)
u007 2× 26 2× Xeon Gold 6230R 384 GB With four NVidia Tesla V100 cards
(each 32GB Memory)
u[008-009] 2× 36 2× Xeon Platinum 8352V 512 GB With four NVidia Tesla A100 cards
(80GB Memory each)
 

Software

  • Operating system RedHat Enterprise Linux (RHEL) / CentOS 7 and 8
  • Batch system SLURM
  • Software management with environment modules.

Storage

  • Home directory
    • The home directory is mounted from central file servers of TUHH-RZ and is available in the Linux PC pools as well. The file system is backupped and snapshots are available.
    • Standard quota is 10GB which can be increased on request.
    • Slow storage for crucial data, not exceptionally useful for large scientific data sets.
  • Local file systems
    • Each node has local storage. Below /usertemp a personal subdirectory is created for each user like /usertemp/<unix-group>/<username>, e.g. /usertemp/rzt/rztkm .
    • The path /usertemp exists on all nodes but points to the local storage. Each node can only access its own /usertemp.
    • Data below /usertemp are not backupped are subject to deletion after 14 days inactivity or after a reboot of the node.
    • Fast storage as a local working directory.
    • A remote access on the local storage of the compute nodes is possible from the login nodes. It can be mounted on request below /remut, e.g. for node g001:

      ls -l /remut/g001

  • parallel BeeGFS network file system
    • The HPC cluster is equipped with a parallel storage system (BeeGFS).
    • Below /work a personal subdirectory /work/<unix-group>/<username> is created for each user, e.g. /work/rzt/rztkm.
    • The parallel file system is intended for temporary data during the simulation. All data is subject to automatic deletion after 90 days of inactivity.
    • Globally visible.
    • Tradeoff between home directory (globally visible, secure, slow, small) and local storage (locally visible, fast).
    • This storage class has no backup - no permanent storage of important data !