[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Netra T1 200 watchdog timeouts



Jurij Smakov wrote:
On Sun, Sep 23, 2012 at 02:07:46PM +0000, Mark Morgan Lloyd wrote:
It went in as 688521 at about the same time as you posted. Pity I
didn't hold off for another hour or so.

Thanks, I'll bcc this response to the bug, let's continue discussion there.

OK, but a couple of slightly more verbose comments here.

Looking at the output you see, I have doubts that it has anything to do with SILO though. SILO prints letters 'S', 'I', 'L' and 'O' (appearing before the prompt) after it completes execution of different parts of first-stage loader. As you can see in the code (first/first.S), printing 'S' is the first thing first-stage loader does upon startup. The fact that it is not seen in the console output suggests that even first-stage loader never got to run. The line

Boot device: /pci@1f,0/pci@1/scsi@8/disk@0,0:a  File and args:

which is normally printed by OBP before control is passed to SILO does not appear in the watchdog-reset case either, which, again, is a strong sign that failure happens before SILO has a chance to run.

In a failure case, how long does it take between you typing 'boot' and
"watchdog reset" message being displayed? This doc

About a second.

http://docs.oracle.com/cd/E19102-01/n240.srvr/817-5481-11/understanding_wdtimer.html

appears to suggest that stuck watchdog would initiate a XIR after 60 seconds by default, is it consistent with what you see? What are the values of various variables mentioned there on your system(s)? Does increasing the timeout help?

As far as I can see, that document refers to either ALOM or Solaris parameters. There's quite a terminology program: the Netra T1 200 has a port labeled "A LOM" above another labeled "B SERIAL" but from what I can see that's /not/ a Sun ALOM port: it goes to a lomlite2 chip which is something different.

Also there are some things that might be relevant which can only by done by Solaris's lom command which isn't available unless you install a not-freely-available package (it needs a device driver, unlike some of the RSC support on e.g. a 280R which doesn't).

I really can't come up with any reason why it would work for Squeeze but not other releases, so testing all suspect SILO versions on the same machine would be an interesting experiment.

This is something I've not had to do before- Debian usually "just
works" or I have to go upstream if I want something bleeding-edge.
Is this syntax right and in view of the message what should I have
in sources.list etc?

root@firewall3:/home/markMLl# apt-get install silo=1.4.14+git20100228-1+b1
..
E: Version '1.4.14+git20100228-1+b1' for 'silo' was not found

That only works when you have repositories containing older/newer packages listed in your /etc/apt/source.list. Simply adding them (without configuring apt pinning appropriately) may mess up too many things, so the simplest way is probably to just download older SILO debs (should be available on archive.debian.org) and install them using dpkg -i.

I can't find a binary for 1.4.14+git20100207-1 that you wanted me to test. I can see versions as below so I'll start working through them.

http://ftp.uk.debian.org/debian/pool/main/s/silo/silo_1.4.14+git20120819-1_sparc.deb
http://ftp.uk.debian.org/debian/pool/main/s/silo/silo_1.4.14+git20100228-1+b1_sparc.deb
http://archive.debian.org/debian/pool/main/s/silo/silo_1.4.13a+git20070930-3_sparc.deb
http://archive.debian.org/debian/pool/main/s/silo/silo_1.4.13-1_sparc.deb
http://archive.debian.org/debian/pool/main/s/silo/silo_1.4.9-1_sparc.deb
http://archive.debian.org/debian/pool/main/s/silo/silo_1.2.5-2_sparc.deb

I'm on it, unless we get more than the usual number of Monday-morning blowups.

--
Mark Morgan Lloyd
markMLl .AT. telemetry.co .DOT. uk

[Opinions above are the author's, not those of his employers or colleagues]


Reply to: