[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: I suspect the kernel: `ping', and name resolution in general, hangs



Since name resolution works with 2.0.36, your /etc/resolv.conf is
probably fine.

Can you ping or traceroute -n to your DNS successfully with 2.2.12 (use
the ip address you have in resolv.conf)? 

On Thu, Oct 07, 1999 at 09:23:47AM -0700, Eric Hanchrow wrote:
> 
> Last month I had a problem: in short, I installed potato, and noticed
> that name resolution hung, although things worked fine if I used a
> numeric IP address.
> 
> I've included below the plea for help that I sent last month.  It
> describes the problem in detail.
> 
> Well, in case anyone's interested, I have some more information that
> leads me to suspect that the problem is in the kernel (and thus,
> presumably, in the Vortex driver).  Here's what I did:
> 
> * I installed potato from scratch.  I did this by installing slink
>   from an official Debian 2.1 CD, and then doing `apt-get
>   dist-upgrade' with my /etc/apt/sources.list pointing at
> 
> 	http://http.us.debian.org/debian unstable
> 
>   Thus I wound up with the latest (as of this morning) binaries, but
>   with the 2.0.36 kernel from the CD.  (Apparantly `apt-get
>   dist-upgrade' didn't automatically give me a new kernel.)  This
>   system worked flawlessly; in particular, name resolution worked
>   fine.
> 
> * I installed kernel-image-2.2.12 (version 2.2.12-3), and rebooted.
>   Name resolution hung exactly as described below.
> 
> * I reinstalled kernel-image-2.0.36, and rebooted; name resolution
>   worked just fine.
> 
> So it seems to me that the newer kernel is doing something wrong.  If
> anyone would like me to perform some experiments, so as to isolate the
> problem, I'd be happy to do them; just tell me what you need done.
> Unfortunately, I know nothing about how the net card driver works, so
> I don't know how to investigate this on my own.
> 
> Here's the plea that I sent last month:
> 
>     Can anyone tell me what's wrong with my system?  At first I assumed it
>     was a bug in the resolver library, and opened a bug against libc6 in
>     Debian potato (http://www.debian.org/Bugs/db/45/45912.html); but the
>     Debian libc6 maintainer is sure that my system is merely
>     misconfigured.
> 
>     Here's the problem:
> 
>     When I type `ping blarg.net' at a shell, `ping' hangs.  I expect it to display
> 
> 	    PING blarg.net (206.124.128.1): 56 data bytes
> 	    64 bytes from 206.124.128.1: icmp_seq=0 ttl=62 time=25.7 ms
> 	    ...
> 
>     Other name resolution also fails.  For example, Netscape hangs when
>     trying to visit web pages on machines other than mine.
> 
>     On the other hand, if I type `ping 206.124.128.1', that works fine.
>     So I know that IP and the network card aren't entirely broken.
> 
>     I've never sat around and waited to see if `ping' eventually gets
>     unstuck; I've always given up and hit control-C after no more than
>     perhaps a minute.
> 
>     I'm using potato (that is, the still-unreleased version of Debian
>     GNU/Linux), which I installed by first installing slink (i.e., Debian
>     2.1) from an official CD-ROM, and then using `apt-get dist-upgrade'
>     from
> 
> 	     http://http.us.debian.org/debian unstable main
> 
>     I did that update around 24 September.
> 
>     Here is some information about the broken system:
> 
>     Package: netbase
>     Version: 3.16-2
> 
>     Package: kernel-image-2.2.9
>     Version: 2.2.9-2
> 
>       My network card driver is 3c59x:
> 
> 	Sep 24 07:21:13 potato kernel: 3c59x.c:v0.99H 11/17/98 Donald Becker http://cesdis.gsfc.nasa.gov/linux/drivers/vortex.html
> 	Sep 24 07:21:13 potato kernel: eth0: 3Com 3Com Boomerang (unknown version) at 0xb800,  00:50:04:1b:f6:df, IRQ 11
> 	Sep 24 07:21:13 potato kernel:   8K byte-wide RAM 5:3 Rx:Tx split, autoselect/Autonegotiate interface.
> 	Sep 24 07:21:13 potato kernel:   MII transceiver found at address 24, status 182d.
> 	Sep 24 07:21:13 potato kernel:   Enabling bus-master transmits and whole-frame receives.
> 
>     This problem didn't always happen, although I don't remember exactly
>     when it started.  I know for certain that it didn't happen immediately
>     after I installed slink, nor did it happen immediately after I
>     upgraded to potato the first time.
> 
>     I've also seen this problem on a different installation of slink (on
>     the same machine with the same hardware), but that problem
>     mysteriously went away.  I now have both slink and potato on this
>     machine, and slink works flawlessly.  Only potato has this
>     name-resolution problem.
> 
>     I haven't noticed any error messages -- certainly none at the shell on
>     which I ran `ping', and none in /var/log.
> 
>     I connect to the Internet via DSL, using a Cisco 675 router, which is
>     a little grey box that sits on the floor (the phone company gave it to
>     me when I signed up for DSL).  I have a phone cord that connects the
>     router and my phone jack; I have an Ethernet cable that connects the
>     router and my network card.
> 
>     The router is quite configurable, and perhaps its configuration is
>     relevant: 
> 
>     * I've got it set to act as a DHCP server, although since I don't know
>       how to make Debian use DHCP, I've told Debian to use a static IP
>       address.  Since I only have one computer, there is no risk of having
>       two IP addresses conflict.
> 
>     * It's doing something called `network address translation', which, as
>       I understand it, means that my machine "appears" to the outside
>       world to have a different IP address than what the machine thinks.
>       That is (as you can see below in my network configuration files), my
>       machine thinks its IP address is 10.0.0.2, but the outside world
>       uses 206.124.128.30 (that address might change from time to time,
>       because the router might be a DHCP client of my ISP).  Also, if I
>       were to connect other machines to the router (with an Ethernet hub),
>       they would get IP addresses like 10.0.0.3, 10.0.0.4, etc.; but they
>       would *all* appear to the outside world as 206.124.128.30.  It would
>       appear that this would cause total confusion, but it doesn't;
>       somehow this `network address translation' keeps things from getting
>       confused.  I don't understand how it does this, but it seems to work
>       OK.  (The place I work used to have a similar setup; they had five
>       machines connected to the Internet, all "sharing" an outside IP
>       address; the machines all worked fine.)  The one tradeoff that I
>       know of is that nobody in the outside world can connect to any
>       servers that I run, because the network address translation
>       apparantly futzes with port numbers.  For example, my SMTP server
>       listens on port 25, but someone who tries to connect to that port
>       using my outside IP address 206.124.128.30 won't be able to.
>       Presumably, if they could guess the port to which the router has
>       "mapped" port 25, they could connect to that port.
> 
>       There may be some more information about the configuration of this
>       box that is relevant.  Please feel free to ask me about it, if you
>       think it would help.
> 
>     Perhaps some of the following network configuration files are
>     relevant:
> 
>     /etc/resolv.conf:
> 	nameserver 206.124.128.1
> 	nameserver 206.124.128.3
> 
>     /etc/hosts:
> 	127.0.0.1   localhost loopback
> 	 10.0.0.1   cisco-router
> 	 10.0.0.2   potato
> 
>     /etc/init.d/network:
> 	#! /bin/sh
> 	ifconfig lo 127.0.0.1
> 	route add -net 127.0.0.0
> 	IPADDR=10.0.0.2
> 	NETMASK=255.255.255.0
> 	NETWORK=10.0.0.0
> 	BROADCAST=10.0.0.255
> 	GATEWAY=10.0.0.1
> 	ifconfig eth0 ${IPADDR} netmask ${NETMASK} broadcast ${BROADCAST}
> 	route add -net ${NETWORK}
> 	[ "${GATEWAY}" ] && route add default gw ${GATEWAY} metric 1
> 
>     Note that those three files are almost-exact copies of the same files
>     on my slink system, which as I said works fine.  The only differences
>     are 
> 	--- /slink/etc/resolv.conf	Sun Sep 12 04:06:13 1999
> 	+++ /potato/etc/resolv.conf	Mon Sep 20 22:00:49 1999
> 	@@ -1,3 +1,2 @@
> 	-search hanchrow.org
> 	 nameserver 206.124.128.1
> 	 nameserver 206.124.128.3
> 
>     (I don't know what that `search' line is doing on my slink system; I
>     assume that it got put there when I installed the system)
> 
> 	--- /slink/etc/hosts	Sun Sep 12 12:49:07 1999
> 	+++ /potato/etc/hosts	Tue Sep 21 22:29:44 1999
> 	@@ -1,3 +1,4 @@
> 	 127.0.0.1	localhost loopback
> 	  10.0.0.1	cisco-router
> 	- 10.0.0.2	snowball
> 	\ No newline at end of file
> 	+ 10.0.0.2	potato
> 	+
> 
>     Now, here's the kicker: the problem goes away if I run `tcpdump': I do
> 
> 	   tcpdump &
> 	   ping blarg.net
> 
>     and `ping' responds correctly.  I can then kill `tcpdump', and until
>     the next time I boot, the network works fine.  It's as if `tcpdump'
>     changed something, and that change allows name resolution to work.
> 
>     So that's the deal.  Any ideas why my system is behaving this way, and
>     what I can do about it?
> 
>     Thanks
> 
> 
> -- 
> Unsubscribe?  mail -s unsubscribe debian-user-request@lists.debian.org < /dev/null
> 

-- 
Bob Nielsen                 Internet: nielsen@primenet.com
Tucson, AZ                  AMPRnet:  w6swe@w6swe.ampr.org
DM42nh                      http://www.primenet.com/~nielsen


Reply to: