[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Blender crash with packaged ROCm 5.2.3 drivers



Hi Jakub,

Jakub Jaszewski, on 2022-11-06:
> I found that the crash is caused by `libamd-comgr2` package. I assumed it
> was necessary since I remember it beeing installed with amdgpu-pro.
> After uninstalling it there is no crash, but Blender still reports no HIP
> devices:
> 
> I1106 23:33:10.011876 123411 device.cpp:32] HIPEW initialization succeeded
> I1106 23:33:10.011904 123411 device.cpp:34] Found precompiled kernels
> HIP hipGetDeviceCount: No HIP-capable device available

Thanks for the hint, if I make sure libamd-comgr2 is in the loop
then I reproduce the crash mentioned on Blender bug tracker;
backtrace from running through the debugger below:

	mesa: CommandLine Error: Option 'h' registered more than once!
	LLVM ERROR: inconsistency in registered CommandLine options
	
	Thread 1 "blender" received signal SIGABRT, Aborted.
	__pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
	44	./nptl/pthread_kill.c: No such file or directory.
	(gdb) bt
	#0  __pthread_kill_implementation
	    (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0)
	    at ./nptl/pthread_kill.c:44
	#1  0x00007fffef8a9d2f in __pthread_kill_internal
	    (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
	#2  0x00007fffef85aef2 in __GI_raise (sig=sig@entry=6)
	    at ../sysdeps/posix/raise.c:26
	#3  0x00007fffef845472 in __GI_abort () at ./stdlib/abort.c:79
	#4  0x00007fff5308bc2b in llvm::report_fatal_error(llvm::Twine const&, bool) ()
	    at /lib/x86_64-linux-gnu/libLLVM-15.so.1
	#5  0x00007fff5308ba76 in  () at /lib/x86_64-linux-gnu/libLLVM-15.so.1
	#6  0x00007fff5307334e in  () at /lib/x86_64-linux-gnu/libLLVM-15.so.1
	#7  0x00007fff53064cbb in llvm::cl::Option::addArgument() ()
	    at /lib/x86_64-linux-gnu/libLLVM-15.so.1
	#8  0x00007ffe2d9fd831 in  () at /lib/x86_64-linux-gnu/libamd_comgr.so.2
	#9  0x00007ffe2d9ba294 in  () at /lib/x86_64-linux-gnu/libamd_comgr.so.2
	#10 0x00007ffff7fcfabe in call_init
	    (env=0x7fffffffe260, argv=0x7fffffffe248, argc=2, l=<optimized out>)
	    at ./elf/dl-init.c:70
	#11 call_init
	    (l=<optimized out>, argc=2, argv=0x7fffffffe248, env=0x7fffffffe260)
	    at ./elf/dl-init.c:26
	#12 0x00007ffff7fcfba4 in _dl_init
	--Type <RET> for more, q to quit, c to continue without paging--
	    (main_map=0x7fff48ef0300, argc=2, argv=0x7fffffffe248, env=0x7fffffffe260)
	    at ./elf/dl-init.c:117
	#13 0x00007fffef96de94 in __GI__dl_catch_exception
	    (exception=<optimized out>, operate=<optimized out>, args=<optimized out>)
	    at ./elf/dl-error-skeleton.c:182
	#14 0x00007ffff7fd630e in dl_open_worker (a=a@entry=0x7fffffffc260)
	    at ./elf/dl-open.c:808
	#15 0x00007fffef96de3a in __GI__dl_catch_exception
	    (exception=<optimized out>, operate=<optimized out>, args=<optimized out>)
	    at ./elf/dl-error-skeleton.c:208
	#16 0x00007ffff7fd66a8 in _dl_open
	    (file=0x7fff31fc5475 "libamd_comgr.so.2", mode=<optimized out>, caller_dlopen=0x7fff31f36466, nsid=<optimized out>, argc=2, argv=0x7fffffffe248, env=0x7fffffffe260) at ./elf/dl-open.c:884
	#17 0x00007fffef8a42d8 in dlopen_doit (a=a@entry=0x7fffffffc4d0)
	    at ./dlfcn/dlopen.c:56
	#18 0x00007fffef96de3a in __GI__dl_catch_exception
	    (exception=exception@entry=0x7fffffffc430, operate=<optimized out>, args=<optimized out>) at ./elf/dl-error-skeleton.c:208
	#19 0x00007fffef96deef in __GI__dl_catch_error
	    (objname=0x7fffffffc488, errstring=0x7fffffffc490, mallocedp=0x7fffffffc487, operate=<optimized out>, args=<optimized out>)
	    at ./elf/dl-error-skeleton.c:227
	--Type <RET> for more, q to quit, c to continue without paging--c
	#20 0x00007fffef8a3dc7 in _dlerror_run (operate=operate@entry=0x7fffef8a4280 <dlopen_doit>, args=args@entry=0x7fffffffc4d0) at ./dlfcn/dlerror.c:138
	#21 0x00007fffef8a4389 in dlopen_implementation (dl_caller=<optimized out>, mode=<optimized out>, file=<optimized out>) at ./dlfcn/dlopen.c:71
	#22 ___dlopen (file=<optimized out>, mode=<optimized out>) at ./dlfcn/dlopen.c:81
	#23 0x00007fff31f36466 in  () at /lib/x86_64-linux-gnu/libamdhip64.so
	#24 0x00007fff31f0c0f1 in  () at /lib/x86_64-linux-gnu/libamdhip64.so
	#25 0x00007fffef8ace37 in __pthread_once_slow (once_control=0x7fff329a1ed8, init_routine=0x7fffefad3200 <__once_proxy>) at ./nptl/pthread_once.c:116
	#26 0x00007fff31f0f5a9 in  () at /lib/x86_64-linux-gnu/libamdhip64.so
	#27 0x00007fff31f604c3 in  () at /lib/x86_64-linux-gnu/libamdhip64.so
	#28 0x00007fff31f60fdd in  () at /lib/x86_64-linux-gnu/libamdhip64.so
	#29 0x00007fff31f0f19e in  () at /lib/x86_64-linux-gnu/libamdhip64.so
	#30 0x00007fff31f52dfe in  () at /lib/x86_64-linux-gnu/libamdhip64.so
	#31 0x00007fff31cc676c in  () at /lib/x86_64-linux-gnu/libamdhip64.so
	#32 0x00007fff31cc75ad in hipInit () at /lib/x86_64-linux-gnu/libamdhip64.so
	#33 0x0000555557e38824 in ccl::device_hip_safe_init () at ./intern/cycles/device/hip/device.cpp:96
	#34 ccl::device_hip_info(ccl::vector<ccl::DeviceInfo, ccl::GuardedAllocator<ccl::DeviceInfo> >&) (devices=...) at ./intern/cycles/device/hip/device.cpp:104
	#35 0x0000555557e20b7a in ccl::Device::available_devices(unsigned int) (mask=34) at ./intern/cycles/device/device.cpp:228
	#36 0x0000555557bbbc3d in ccl::available_devices_func(PyObject*, PyObject*) (args=<optimized out>) at ./intern/cycles/blender/python.cpp:416
	#37 0x00007fffeff28413 in  () at /lib/x86_64-linux-gnu/libpython3.10.so.1.0
	#38 0x00007fffefedebce in _PyObject_MakeTpCall () at /lib/x86_64-linux-gnu/libpython3.10.so.1.0
	#39 0x00007fffefe79cb4 in _PyEval_EvalFrameDefault () at /lib/x86_64-linux-gnu/libpython3.10.so.1.0
	#40 0x00007fffeffc70c6 in  () at /lib/x86_64-linux-gnu/libpython3.10.so.1.0
	#41 0x00007fffefee31b8 in  () at /lib/x86_64-linux-gnu/libpython3.10.so.1.0
	#42 0x00007fffefe79c63 in _PyEval_EvalFrameDefault () at /lib/x86_64-linux-gnu/libpython3.10.so.1.0
	#43 0x00007fffeffc70c6 in  () at /lib/x86_64-linux-gnu/libpython3.10.so.1.0
	#44 0x00007fffefee31b8 in  () at /lib/x86_64-linux-gnu/libpython3.10.so.1.0
	#45 0x00007fffefe79c63 in _PyEval_EvalFrameDefault () at /lib/x86_64-linux-gnu/libpython3.10.so.1.0
	#46 0x00007fffeffc70c6 in  () at /lib/x86_64-linux-gnu/libpython3.10.so.1.0
	#47 0x00007fffefee31b8 in  () at /lib/x86_64-linux-gnu/libpython3.10.so.1.0
	#48 0x00007fffefe79c63 in _PyEval_EvalFrameDefault () at /lib/x86_64-linux-gnu/libpython3.10.so.1.0
	#49 0x00007fffeffc70c6 in  () at /lib/x86_64-linux-gnu/libpython3.10.so.1.0
	#50 0x0000555556ac015f in bpy_class_call (C=0x7fffd967e2b8, ptr=<optimized out>, func=0x55555ac15da0 <rna_Panel_draw_func>, parms=0x7fffffffdca0) at ./source/blender/python/intern/bpy_rna.c:8690
	#51 0x0000555556a5da5c in panel_draw (C=<optimized out>, panel=0x7fff37d7b338) at ./source/blender/makesrna/intern/rna_ui.c:129
	#52 0x0000555556adafab in ed_panel_draw (C=C@entry=0x7fffd967e2b8, region=region@entry=0x7fff49fd4238, lb=lb@entry=0x7fff49fd4330, pt=pt@entry=0x7fff4a1e5938, panel=0x7fff37d7b338, panel@entry=0x0, w=484, em=20, unique_panel_str=0x0, search_filter=0x0) at ./source/blender/editors/screen/area.c:2791
	#53 0x0000555556adca43 in ED_region_panels_layout_ex (C=C@entry=0x7fffd967e2b8, region=region@entry=0x7fff49fd4238, paneltypes=<optimized out>, contexts=contexts@entry=0x7fffffffdf60, category_override=category_override@entry=0x0) at ./source/blender/editors/screen/area.c:2989
	#54 0x00005555584a3be5 in userpref_main_region_layout (C=0x7fffd967e2b8, region=0x7fff49fd4238) at ./source/blender/editors/space_userpref/space_userpref.c:128
	#55 0x0000555556adbb9e in ED_region_do_layout (C=C@entry=0x7fffd967e2b8, region=region@entry=0x7fff49fd4238) at ./source/blender/editors/screen/area.c:511
	#56 0x00005555565543f5 in wm_draw_window_offscreen (stereo=false, win=0x7fff37d024f8, C=0x7fffd967e2b8) at ./source/blender/windowmanager/intern/wm_draw.c:889
	#57 wm_draw_window (win=0x7fff37d024f8, C=0x7fffd967e2b8) at ./source/blender/windowmanager/intern/wm_draw.c:1111
	#58 wm_draw_update (C=0x7fffd967e2b8) at ./source/blender/windowmanager/intern/wm_draw.c:1338
	#59 0x0000555556550f40 in WM_main (C=C@entry=0x7fffd967e2b8) at ./source/blender/windowmanager/intern/wm.c:640
	#60 0x0000555555efa1ca in main (argc=2, argv=0x7fffffffe248) at ./source/creator/creator.c:547

> Another findings are that my hipconfig under `== hip-clang` section does not
> have any devices listed, and that hipconfig.pl is missing some path:
> 
> InstalledDir: /usr/bin
> Can't exec "/usr/bin/llc": No such file or directory at

I seem to have missed that command when updating the hipconfig
script to call for llvm-15.  This should be fixed in the next
upload[1].

[1]: https://salsa.debian.org/rocm-team/rocm-hipamd/-/commit/f4737f34300f60d7a291e31e5e2dfdc2894cd636

[…]
> I don't know how much of this is caused by me not knowing what exactly to
> install and configure and how much by all the stack beeing so new and
> untested.

This is all very new, so I guess there may be quite some of the
latter.  I begin adding autopkgtest where I can[2] to limit such
breakages in the future.  Thank you very much for your feedback!
:)

[2]: https://salsa.debian.org/rocm-team/rocm-hipamd/-/commit/76d4914fbebea5c7bbd1586493a934a912db5e4a

> Best regards,

Have a nice day,  :)
-- 
Étienne Mollier <emollier@emlwks999.eu>
Fingerprint:  8f91 b227 c7d6 f2b1 948c  8236 793c f67e 8f0d 11da
Sent from /dev/pts/1, please excuse my verbosity.
On air: Thank You Scientist - Terraformer

Attachment: signature.asc
Description: PGP signature


Reply to: