[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#419950: eth ini. vs. ide ini.



As long as I have had this computer, since some 2.6.8 sarge kernel,
I have occasional problems where the network goes bad, with these lines
repeating forever in the syslog:
Feb 14 06:44:10 legba kernel: NETDEV WATCHDOG: eth0: transmit timed out
Feb 14 06:44:10 legba kernel: 0000:00:0f.0: tulip_stop_rxtx() failed

Sometimes the problem does not occur, and everything runs just fine until
I reboot the system, even if I pound on the network, trying to make it fail.
Sometimes the problem shows up, even with moderate network load, and the
network is _very_ sluggish until I reboot.

So far, this does not seem to depend on the kernel version.  Each kernel
I've tried is bad sometimes, and occasionally will boot up OK.
After combing through the logs, I have found a pattern which correlates
with my problems.  It looks like when I have the problem, there is some
overlapping of the initialization messages for hda and for eth0; and when
the machine is booting OK and will not have a problem, these initialization
messages are separated in the logs.  Here are some sample logs:

Here is an extract from dmesg on 2008-04-01, still running today with
no problems:
Linux Tulip driver version 1.1.13-NAPI (May 11, 2002)
PCI: Enabling device 0000:00:0f.0 (0114 -> 0117)
ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 10
PCI: setting IRQ 10 as level-triggered
ACPI: PCI Interrupt 0000:00:0f.0[A] -> Link [LNKC] -> GSI 10 (level, low) -> IRQ 10
tulip0:  MII transceiver #1 config 1000 status 786d advertising 05e1.
eth0: ADMtek Comet rev 17 at 00011400, 00:14:BF:5C:E1:35, IRQ 10.
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
PIIX4: IDE controller at PCI slot 0000:00:07.1
PIIX4: chipset revision 1
PIIX4: not 100% native mode: will probe irqs later
    ide0: BM-DMA at 0x1000-0x1007, BIOS settings: hda:pio, hdb:DMA
    ide1: BM-DMA at 0x1008-0x100f, BIOS settings: hdc:DMA, hdd:pio
Probing IDE interface ide0...
usb 1-2: new full speed USB device using uhci_hcd and address 2
usb 1-2: configuration #1 chosen from 1 choice
hub 1-2:1.0: USB hub found
hub 1-2:1.0: 4 ports detected
hda: WDC WD800JB-00CRA1, ATA DISK drive
Time: acpi_pm clocksource has been installed.
hdb: ST3250623A, ATA DISK drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
Probing IDE interface ide1...
hdc: Hewlett-Packard CD-Writer Plus 9100, ATAPI CD/DVD-ROM drive
ide1 at 0x170-0x177,0x376 on irq 15
hda: max request size: 128KiB
hda: 156301488 sectors (80026 MB) w/8192KiB Cache, CHS=65535/16/63, UDMA(33)
hda: cache flushes not supported
 hda: hda1 hda2 hda3 hda4 < hda5 hda6 hda7 hda8 hda9 hda10 >
hdb: max request size: 512KiB
hdb: 488397168 sectors (250059 MB) w/16384KiB Cache, CHS=30401/255/63, UDMA(33)
hdb: cache flushes supported
 hdb: hdb1 hdb2

For contrast, here is a similar extract from 2008-03-16 dmesg, after which
the network became bad under light bittorrent pressure:
Linux Tulip driver version 1.1.13-NAPI (May 11, 2002)
hda: WDC WD800JB-00CRA1, ATA DISK drive
Time: acpi_pm clocksource has been installed.
hdb: ST3250623A, ATA DISK drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
Probing IDE interface ide1...
hdc: Hewlett-Packard CD-Writer Plus 9100, ATAPI CD/DVD-ROM drive
ide1 at 0x170-0x177,0x376 on irq 15
ACPI: PCI Interrupt Link [LNKD] enabled at IRQ 9
PCI: setting IRQ 9 as level-triggered
ACPI: PCI Interrupt 0000:00:07.2[D] -> Link [LNKD] -> GSI 9 (level, low) -> IRQ 9
uhci_hcd 0000:00:07.2: UHCI Host Controller
uhci_hcd 0000:00:07.2: new USB bus registered, assigned bus number 1
uhci_hcd 0000:00:07.2: irq 9, io base 0x00001020
usb usb1: configuration #1 chosen from 1 choice
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 2 ports detected
hda: max request size: 128KiB
hda: 156301488 sectors (80026 MB) w/8192KiB Cache, CHS=65535/16/63, UDMA(33)
hda: cache flushes not supported
 hda:PCI: Enabling device 0000:00:0f.0 (0114 -> 0117)
ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 10
PCI: setting IRQ 10 as level-triggered
ACPI: PCI Interrupt 0000:00:0f.0[A] -> Link [LNKC] -> GSI 10 (level, low) -> IRQ 10
tulip0:  MII transceiver #1 config 1000 status 786d advertising 05e1.
eth0: ADMtek Comet rev 17 at 00011400, 00:14:BF:5C:E1:35, IRQ 10.
 hda1 hda2 hda3 hda4 < hda5 hda6 hda7 hda8 hda9 hda10 >
hdb: max request size: 512KiB
hdb: 488397168 sectors (250059 MB) w/16384KiB Cache, CHS=30401/255/63, UDMA(33)
hdb: cache flushes supported
 hdb: hdb1 hdb2

What I notice here is that the log message that should be 1 line like this:
 hda: hda1 hda2 hda3 hda4 < hda5 hda6 hda7 hda8 hda9 hda10 >
is split after the " hda:" in all the cases of an unsuccessful boot,
with some of the ethernet initialization messages printed before the
remaining part of the hda message
" hda1 hda2 hda3 hda4 < hda5 hda6 hda7 hda8 hda9 hda10 >"

From what I observe, this corresponds 100% with the bad network behavior.
The kernel version currently running here is:
Linux version 2.6.18-6-686 (Debian 2.6.18.dfsg.1-18etch1) (waldi@debian.org)

I'm willing to try other kernel versions or parameters, and willing to
provide any other info that might help someone understand this problem.

For now, I at least have a clumsy workaround of rebooting until I see that
the eth0 and hda initializations are not intermingled in dmesg.




Reply to: