Hi Petter,
Btw, can you say more about why you used the b2110 branch and clang 15 instead of the default?
b2110 is a tag. I checked out a specific version of llama-cpp to make my instructions more reproducable. At the time when I wrote the instructions, b2110 was the most recent tag.
The use of clang-15 is required because some of the low-level components in the ROCm stack are tightly coupled to a specific LLVM version (e.g., rocm-device-libs). At any given time, there is only one version of clang packaged for Debian that can be used for compiling code for the HIP language. At the moment, that is clang-15. However, I expect that will be clang-17 by next month.
Btw, would whisper-cpp be a better match for Debian than the openai-whisper implementation?
I know very little about AI applications, but I see that whisper-cpp has a hipblas backend that can be enabled with `-DWHISPER_HIPBLAS=ON`. A quick review of the codebase suggests to me all dependencies required to package whisper-cpp with GPU acceleration are probably already in Debian.
The openai-whisper implementation seems to depend on pytorch. Based on our progress over the past year, think we'll get pytorch-rocm packaged some time this year, but there's still a few more ROCm packages we need before that can happen (miopen, roctracer, composable-kernel [maybe], hipblaslt [maybe]).
If we want to have a GPU-accelerated whisper library working ASAP, perhaps it would be better to package whisper-cpp.
Sincerely,
Cory Bloor
P.S. The performance of rocblas/hipblas is heavily
dependent on Tensile tuning. If Debian users want better
performance in particular applications, we can run the
applications with rocblas logging [1] and tensile logging [2]
enabled and then add Tensile kernels for those specific problems.
[1]:
https://rocm.docs.amd.com/projects/rocBLAS/en/docs-6.0.2/API_Reference_Guide.html#logging-in-rocblas
[2]: https://github.com/ROCm/Tensile/wiki/Environment-Variables