[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

RE: Production Debian as HPC OS - seeking knowledge exchange



Thank you for the information Henning. Your system looks quite interesting and for my own purposes gives us some hope that we might see success running a Debian-based HPC cluster.

Looking at time zones, our monthly meetings on the 3rd Thursday of each Month at 5PM GMT may not be suitable for your schedule. That said, if it did happen to work we'd love to have you to speak about your experience building and running this cluster. The audience for the Systems-facing track is other HPC System Administrators and Engineers, so technical info is very welcomed. 

You can learn more about CaRCC here: https://carcc.org/people-network/

Right now it is fairly North American-centric but we are growing to be a more international body. 

Sincerely,
-M

-----Original Message-----
From: Henning Glawe <glaweh@debian.org> 
Sent: Wednesday, July 12, 2023 8:15 AM
To: Smith, Matthew <matthew.smith@ubc.ca>
Cc: debian-hpc@lists.debian.org
Subject: Re: Production Debian as HPC OS - seeking knowledge exchange

[CAUTION: Non-UBC Email]

Moin Matthew,

On Fri, Jul 07, 2023 at 06:37:17PM +0000, Smith, Matthew wrote:
> I’m spamming y’all to see if there is anyone on this list running 
> Debian as the OS for an in-production academic HPC cluster.

We are running Debian-Bullseye based HPC cluster here at MPSD [1].


> I’m part of a group that some of
> you might be well aware of, the CaRCC people network, that puts 
> together monthly talks and discussions that take place monthly. Given 
> the recent happenings, we’re looking to secure this topic for our 
> September Systems-facing call, and I’m curious if there is anyone on 
> this list who is in this position and would be comfortable speaking to 
> it. We’d be looking for an overview of the cluster setup, configuration, and operational tasks.

I was not aware of CaRCC, and am not sure if and when I could give a talk on our setup.

Summarizing our local HPC HW (quite heterogenous):
- /home via nfs4 (FC SAN via Proxmox guest)
- /scratch via cephfs (spinning-discs HP Apollo)
- login nodes as Proxmox guests
- compute nodes:
  - many old 16-core nodes with 64G RAM
  - some newer 4-socket nodes with 2T RAM
  - some nVidia-V100 GPU nodes
  - a few power8-nodes
  - 10GbE / Infiniband FDR interconnects

Software/OS setup:
- Install/Config management via FAI [2], config in local git
- Debian Bullseye with some official and local backports
- Micro-architecture-optimized HPC tool chains via SPACK [3] and easybuild
  (legacy)

User services:
- generic SLURM (job scheduler)
- Buildbot-workers for HPC TDDFT code "Octopus" [4]


[1] https://mpsd.mpg.de
[2] https://fai-project.org/
[3] https://spack.readthedocs.io/en/latest/
[4] https://octopus-code.org/
--
Mit freundlichen Grüßen
Henning Glawe

Dr. Henning Glawe
Max-Planck-Institut für Struktur und Dynamik der Materie Geb. 99 (CFEL), Luruper Chaussee 149, 22761 Hamburg, Germany http://www.mpsd.mpg.de/, Email: henning.glawe@mpsd.mpg.de
Building/Room: 99/O2.100, Phone: +49-40-8998-88392

Reply to: