[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#991967: #991967: Simply ACPI powerdown/reset issue?



On Sat, 11 Sep 2021 13:29:12 +0200 Salvatore Bonaccorso <carnil@debian.org> wrote:
> Hi Elliott,
>
> On Fri, Sep 10, 2021 at 06:47:12PM -0700, Elliott Mitchell wrote:
> > An experiment lead to a potential alternative explanation for #991967.
> > The issue may be ACPI (non-UEFI) powerdown/reset was broken at
> > 4.19.194-3. Presence of Xen on the system may be unrelated.
> >
> > Failing that, it could be Xen and non-UEFI systems are effected. (Xen
> > was tried on a UEFI system and the issue wasn't observed)
>
> Following up on https://bugs.debian.org/991967#12
>
> Did you succeeded in bisecting the issue as you seem to have it
> reproducible?
>
> Regards,
> Salvatore
>
>

Hello Elliott and Salvatore,

I noticed this bug on bullseye ever since I have been
running bullseye as a dom0, but my testing indicates
there is no problem with src:linux but the problem
appeared in src:xen with the 4.14 version of xen on
bullseye.

I ask Elliott if you are only seeing the problem on Debian's
xen-4.14 hypervisor? Also, which architecture, arm or
amd64? I only see the problem on the Debian xen-4.14
hypervisor, and I have only tested on amd64, and I
have found a fix for my amd64 system which is as
follows:

Motherboard: ASRock B85M Pro4, BIOS P2.50 12/11/2015,
with a Haswell CPU (core i5-4590S)

xen hypervisor version: 4.14.2+25-gb6a8c4f72d-2, amd64

linux kernel version: 5.10.46-4 (the current amd64 kernel
for bullseye)

Boot system: EFI, not using secure boot, booting xen
hypervisor and dom0 bullseye with grub-efi package for
bullseye, and it boots the xen-4.14-amd64.gz file, not
the xen-4.14-amd64.efi file.

I also tested a buster dom0 with the 4.19 series kernel
on the xen-4.14 hypervisor from bullseye and saw the
problem, but I did not see the problem with either
a buster (linux 4.19) or bullseye (linux 5.10) dom0 on
the xen-4.11 hypervisor, so I think the problem is
with the Debian version of the xen-4.14 hypervisor,
not with src:linux.

I also found a fix in src:xen:

I noticed the series of patches in debian/patches of the
4.14.2+25-gb6a8c4f72d-2 version of src:xen (and
earlier versions of xen-4.14 on Debian) have several patches
backported from the unstable branch of xen upstream. By
removing some of these patches from the patches
series of the src:xen package, the dom0 shuts down
as expected on my ASRock Haswell motherboard.

I rebuilt the src:xen package after removing the following
patches from the debian/patches series and the result
was that the computer shuts down as expected if I boot
using the patched hypervisor:

0027-xen-rpi4-implement-watchdog-based-reset.patch
0028-tools-python-Pass-linker-to-Python-build-process.patch
0029-xen-arm-acpi-Don-t-fail-if-SPCR-table-is-absent.patch
0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch
0031-xen-arm-acpi-The-fixmap-area-should-always-be-cleare.patch
0032-xen-arm-Check-if-the-platform-is-not-using-ACPI-befo.patch
0033-xen-arm-Introduce-fw_unreserved_regions-and-use-it.patch
0034-xen-arm-acpi-add-BAD_MADT_GICC_ENTRY-macro.patch
0035-xen-arm-traps-Don-t-panic-when-receiving-an-unknown-.patch

Most of these patches seem unrelated to the amd64
architecture and instead affect the arm architecture, and
removing all these patches is probably more than is needed to
fix this bug, but I removed them all because I could not find
them upstream on the 4.14 branch but instead only saw them
on the xen unstable branch upstream (I did not check if they are
on the 4.15 branch upstream), and I wanted to test
a true upstream 4.14 version without these seemingly
aggressive patches added by Debian from the unstable
branch of xen upstream, and I discovered by being
more conservative and not adding these patches from the
unstable branch upstream fixed the problem!

I suspect the following patch is the culprit for problems
shutting down on the amd64 architecture:

0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch

The commit log for this patch states:

From: Julien Grall <jgrall@amazon.com>
Date: Sat, 26 Sep 2020 17:44:29 +0100
Subject: xen/acpi: Rework acpi_os_map_memory() and acpi_os_unmap_memory()

The functions acpi_os_{un,}map_memory() are meant to be arch-agnostic
while the __acpi_os_{un,}map_memory() are meant to be arch-specific.

Currently, the former are still containing x86 specific code.

To avoid this rather strange split, the generic helpers are reworked so
they are arch-agnostic. This requires the introduction of a new helper
__acpi_os_unmap_memory() that will undo any mapping done by
__acpi_os_map_memory().

Currently, the arch-helper for unmap is basically a no-op so it only
returns whether the mapping was arch specific. But this will change
in the future.

Note that the x86 version of acpi_os_map_memory() was already able to
able the 1MB region. Hence why there is no addition of new code.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Rahul Singh <rahul.singh@arm.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Tested-by: Rahul Singh <rahul.singh@arm.com>
Tested-by: Elliott Mitchell <ehem+xen@m5p.com>
(cherry picked from commit 1c4aa69ca1e1fad20b2158051eb152276d1eb973)
---------------------------------------------------

This patch does affect amd64 acpi code, and is probably causing
the problem on my amd64 system, so my build of the xen-4.14
hypervisor without this patch fixed the problem.

I think this bug should be re-classified as a bug in src:xen.

I also would inquire with the Debian Xen Team about why they
are backporting patches from the upstream xen unstable
branch into Debian's 4.14 package that is currently shipping
on Debian stable (bullseye). IMHO, the aforementioned
patches that are not in the stable 4.14 branch upstream
should not be included in the xen package for Debian stable.

Regards,

Chuck Zmudzinski


Reply to: