Re: llama-cpp with AMD GPUs

To: debian-ai@lists.debian.org
Subject: Re: llama-cpp with AMD GPUs
From: Cordell Bloor <cgmb@slerp.xyz>
Date: Sat, 10 Feb 2024 17:35:45 -0700
Message-id: <[🔎] d928eda2-d831-e785-7745-0207c3895280@slerp.xyz>
In-reply-to: <[🔎] sa6zfw82cyb.fsf@hjemme.reinholdtsen.name>
References: <[🔎] 39620f27-dbd1-4756-ab3d-e7464712b880@slerp.xyz> <[🔎] 24d1c8ab-62cf-4632-afc6-6a264dfbba5b@debian.org> <[🔎] 30f19094-2e83-4953-97f0-abcaa2ce6333@slerp.xyz> <[🔎] sa6zfw82cyb.fsf@hjemme.reinholdtsen.name>

Hi Petter,

On 2024-02-10 15:04, Petter Reinholdtsen wrote:

Btw, can you say more about why you used the b2110 branch and clang 15
instead of the default?

b2110 is a tag. I checked out a specific version of llama-cpp to make my instructions more reproducable. At the time when I wrote the instructions, b2110 was the most recent tag.

The use of clang-15 is required because some of the low-level components in the ROCm stack are tightly coupled to a specific LLVM version (e.g., rocm-device-libs). At any given time, there is only one version of clang packaged for Debian that can be used for compiling code for the HIP language. At the moment, that is clang-15. However, I expect that will be clang-17 by next month.

Btw, would whisper-cpp be a better match for Debian than the
openai-whisper implementation?

I know very little about AI applications, but I see that whisper-cpp has a hipblas backend that can be enabled with `-DWHISPER_HIPBLAS=ON`. A quick review of the codebase suggests to me all dependencies required to package whisper-cpp with GPU acceleration are probably already in Debian.

The openai-whisper implementation seems to depend on pytorch. Based on our progress over the past year, think we'll get pytorch-rocm packaged some time this year, but there's still a few more ROCm packages we need before that can happen (miopen, roctracer, composable-kernel [maybe], hipblaslt [maybe]).

If we want to have a GPU-accelerated whisper library working ASAP, perhaps it would be better to package whisper-cpp.

Sincerely,
Cory Bloor

P.S. The performance of rocblas/hipblas is heavily dependent on Tensile tuning. If Debian users want better performance in particular applications, we can run the applications with rocblas logging [1] and tensile logging [2] enabled and then add Tensile kernels for those specific problems.

[1]: https://rocm.docs.amd.com/projects/rocBLAS/en/docs-6.0.2/API_Reference_Guide.html#logging-in-rocblas
[2]: https://github.com/ROCm/Tensile/wiki/Environment-Variables

Reply to:

Follow-Ups:
- whisper-cpp with AMD GPUs (Was: llama-cpp with AMD GPUs)
  - From: Cordell Bloor <cgmb@slerp.xyz>

References:
- llama-cpp with AMD GPUs
  - From: Cordell Bloor <cgmb@slerp.xyz>
- Re: llama-cpp with AMD GPUs
  - From: Christian Kastner <ckk@debian.org>
- Re: llama-cpp with AMD GPUs
  - From: Cordell Bloor <cgmb@slerp.xyz>
- Re: llama-cpp with AMD GPUs
  - From: Petter Reinholdtsen <pere@hungry.com>

Prev by Date: Re: llama-cpp with AMD GPUs
Next by Date: Re: llama-cpp with AMD GPUs
Previous by thread: Re: llama-cpp with AMD GPUs
Next by thread: whisper-cpp with AMD GPUs (Was: llama-cpp with AMD GPUs)
Index(es):
- Date
- Thread