[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Booting the kernel on very large NUMA systems



Hi!

my department at the university has inherited an SGI Altix UV-1000
compute cluster [1]. It consists of two blade centers with 16 blades
each and each blade sporting 64 GB of local memory and two Intel
Xeon X7560 CPUs, totaling to 64 CPUs (1024 with multi-core and
Hyperthreading enabled) and 2 TiB RAM.

The blades are inter-connected through the NUMAlink system meaning
that all blades form a single, logical node with 1024 CPUs and
2 TiB RAM.

The machine was originally shipped with SuSE Linux Enterprise
Server 11 (SLES) running kernel 2.6.32 if my mind serves right.
I have replaced the SLES installation with a stock Debian Wheezy (not
without making a full backup of the original SLES installation)
since this the distribution of our choice.

After getting familiar with the system, it turns out that it's
anything but trivial to get Linux boot on it. First, I encountered
problems with GRUB which had trouble with the amount of e820 memory
table entries which got resolved with a more recent GRUB release [3].

Now that GRUB was working fine, I ran into problems with the kernel
which apparently simply froze when trying to boot. I tried various
Linux distributions and kernels without success. However, it turned
out the kernel boots just fine when disabling NUMAlink meaning
that only the first of the 32 blades is used which reduces
the machine to 32 CPUs and 64 GB RAM which is apparently not
what you want when you have a machine which consumes 33 kW
of power ;).

Anyway, I did some further research and it turns out that SGI
has a very long list of kernel parameters when booting the
machine, to be more exact these:

"/sgiroot  splash=silent showopts stop_machine.lazy=1 add_efi_memmap
nortsched processor.max_cstate=1 nobau log_buf_len=8M kdb=on
cgroup_disable=memory earlyprintk=ttyS0,115200n8 pcie_aspm=on nohz=off
crashkernel=512M intel_iommu=off init=/sbin/bootcpuset
console=ttyS0,115200n8 "

Most of these are explained here [4] and are obviously part of
the vanilla Linux kernel. However, the parameter "stop_machine.lazy"
appears to be exclusive to SuSE kernels [5].

Now, I am wondering whether the SuSE patch is actually what gets
the kernel booting on the UV1000 with NUMAlink enabled, I haven't
built a kernel with the patch added yet, however. I will do that
once I get back to work in the new year.

I was just wondering if anyone has some more suggestions what I could
look into and what might cause the kernel to freeze immediately
after GRUB with NUMAlink enabled. It freezes right after decompressing
the kernel.

Any idea?

Cheers,

Adrian

PS: Some documentation on the UV from SGI [6-7].

> [1] http://www.sgi.com/products/remarketed/servers/uv1.html
> [2] http://en.wikipedia.org/wiki/NUMAlink
> [3]
http://git.savannah.gnu.org/cgit/grub.git/commit/?id=a4e5ca80d97077cf302223a7c6aa38a2a9bedf8a
> [4] https://access.redhat.com/site/articles/42548
> [5]
http://kernel.opensuse.org/cgit/kernel-source/commit/?id=39eac1e710e6c9c8a524ad9a6319a3426e872894
> [6]
http://techpubs.sgi.com/library/tpl/cgi-bin/getdoc.cgi/hdwr/bks/SGI_Admin/books/UV_Wind_Install_AG/sgi_html/ch01.html#Z1299705322tls
> [7]
http://techpubs.sgi.com/library/manuals/5000/007-5663-003/pdf/007-5663-003.pdf

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaubitz@debian.org
`. `'   Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913


Reply to: