[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Segfault in hsa_init on Debian-derived systems



Hello,

One of the things that has vexxed me about the 'universe' packages for ROCm on Ubuntu is that they simply didn't work. On Ubuntu 22.04 and 22.10, even just a simple `rocminfo` invocation would crash with a segfault. I assumed that there was something that Ubuntu was doing to the Debian packages that was breaking them, but I was wrong.

I've finally figured out what is happening. This is the static initialization order fiasco [1]. Inside libhsa-runtime64, there are two variables with static storage duration in different translation units, and there is an implicit expectation that one will be initialized before the other. That expectation happens to be satisfied when rocr-runtime is built for Debian, but the order just happens to be different when the package is built for Ubuntu and that expectation is violated. When the variables are initialized out-of-order, some data that is copied from one static variable to another is copied before it is initialized (including pointers which are therefore unexpectedly null).

This problem technically does not affect the Debian binary package, but only by coincidence. Any change to the build toolchain could potentially change the initialization order and thereby introduce this defect into the library the next time it is built from source.

I'm not sure what priority this issue is for Debian. In fact, I'm not sure if it even is technically a bug on Debian given that it hasn't manifested itself yet. Nevertheless, I hope we can upload a fix for this relatively quickly given that both Debian [2] and Ubuntu [3] will be freezing packages soon.

I've prepared a patch to ensure the correct initialization order [4]. If there's anything more that I can do to help, please let me know.

Sincerely,
Cory Bloor

[1]: https://en.cppreference.com/w/cpp/language/siof
[2]: https://release.debian.org/bookworm/freeze_policy.html
[3]: https://discourse.ubuntu.com/t/lunar-lobster-release-schedule/27284
[4]: https://salsa.debian.org/rocm-team/rocr-runtime/-/commit/b4bfd3e07426034ba65a5f4adf05b41c23c3eb81



Reply to: