[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#991967: Simply ACPI powerdown/reset issue?



On 9/25/2021 11:27 PM, Elliott Mitchell wrote:
On Tue, Sep 21, 2021 at 06:33:20AM -0400, Chuck Zmudzinski wrote:
I presume you are suggesting I try booting 4.19.181-1 on the
current version of Xen-4.14 for bullseye as a dom0. I am not
inclined to try it until an official Debian developer endorses
your opinion that the bug I am seeing is distinct
from #991967, at which point I will report the bug I am
seeing as a new bug.
Chuck Zmudzinski you are getting rather close to my threshold for calling
harrassment.  You're not /quite/ there, but I'm concerned.

Sorry if I offended you in some way, I didn't mean to.

Since the purpose of the bug reports is to find and diagnose bugs, I did
a bit of experimentation and made some observations.

I checked out the Debian Xen source via git.  I got the current
"master" branch which is presently the candidate 4.14.3-1 version,
which includes urgent fixes.  The hash is:
e7a17db0305c8de891b366ad37777528e5a43015

On top of this I cherry-picked 3 commits from Xen's main branch:
5a4087004d1adbbb223925f3306db0e5824a2bdc
0f089bbf43ecce6f27576cb548ba4341d0ec46a8
bc141e8ca56200bdd0a12e04a6ebff3c19d6c27b

By main branch, I presume you mean the unstable
4.16 branch of Xen. Correct?
(these can be retrieved via Xen's gitweb at
https://xenbits.xen.org/gitweb/?p=xen.git;a=patch;h=<$hash> which is
suitable for the `git am` command)

With these I built 4.14.3-1 and then tried kernels 4.19.181-1 and
4.19.194-3 (this system is presently mostly on oldstable).  The results
were:

Xen 4.14.3-1 with Linux 4.19.181-1: system reboots were successful

Xen 4.14.3-1 with Linux 4.19.194-3: system reboots hung


Interesting. Looks like you are honing in on solving this bug. I notice
at the beginning of this message you quoted an older message of mine
which does not take into account that I have reported a new bug
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=994899
because I did come to the conclusion, as you did, that there are
in fact two bugs.

I wonder if the results of your modified Xen 4.14.3-1 with
4.19.181-1 and 4.19.194-3 on my hardware would be of help.
I have, as you might recall, older (Haswell) intel, EFI boot
system, and systemd for init/shutdown services.
If I get the same result, then I would agree we are seeing a
regression between those two versions of Linux. Otherwise,
then there may also be some tests involving EFI vs. BIOS to
do. Or, based on what I have learned at #994899, also possibly
we need to check systemd vs. sysv-init. Do you want me to
do the test on my hardware?

Unfortunately I was too quick at installing the rebuilt 4.14.3-1 and I
missed trying the vanilla Debian 4.14.2+25-gb6a8c4f72d-2 with
Linux 4.19.181-1. I believe this combination would have hung during
reboot.

I can confirm it did hang on my hardware with this combination of
Xen and Linux versions.

As such, I believe there are in fact two distinct bugs being observed.
The presence of EITHER of these is sufficient to cause hangs during
powerdown or reboot.

And we already have two distinct bugs on BTS.
First, some patch originally from Linux's main branch breaks Xen reboots
was backported somewhere between 4.19.181-1 and 4.19.194-3.  This may
either have been introduced before 5.10 diverged from main, or may also
have been backported to 5.10.  THIS is Debian bug #991967.

I agree. I believe you.
Second, the Xen patch 3c428e9ecb1f290689080c11e0c37b793425bef1 which is
valuable to ARM devices breaks reboots and powerdowns on x86.  This is
correctly fixed by 0f089bbf43ecce6f27576cb548ba4341d0ec46a8.

Presently
this has no Debian bug report.

That looks a lot like #994889. Have you ruled out the possibility that
this bug is #994889 in disguise? If so, how? Or do you think #994889
is a third bug?

The first is presently unidentified, someone enthusiastic either needs to
read git logs/source code, or bisect and build to find where it got
broken.

Yeah, that's alot of work. That's how I found my solution for #994889.
For that bug, since the working version was Xen 4.11 and the broken
version was Xen 4.14, the cause could have been in 4.12, 4.13, or 4.14.
So that required a bit of detective work studying git logs, but in the
end, I just tested 4.12, and it was good, then 4.13 and it was good.
I also tested the first Debian version of 4.14, which was actually
experimental on Debian if I recall correctly. It did not include the
RPI4 patches, and it was good too. So I knew the bug was introduced
sometime after that, and I soon identified the RPI4 patches as the place
where the bug (#994889) first appeared on my hardware.
The second we seem to have a fix.  The only question is how many patches
to cherry pick?  bc141e8ca562 is non-urgent as it is merely superficial
and not needed for functionality.
5a4087004d1a is a workaround for Linux kernel breakage, but how likely
are we to see that fixed in the Linux kernel packages?  The fix is
well-contained and needed for some highly popular ARM devices.



When you decide what to do here, I would like to check it to
see if it works on my hardware and if you don't hear anything
from me, you can assume it worked fine on my hardware.

Cheers,

Chuck


Reply to: