Debian Bug report logs - #45912
`ping', and name resolution in general, hangs

version graph

Package: libc6; Maintainer for libc6 is GNU Libc Maintainers <debian-glibc@lists.debian.org>; Source for libc6 is src:glibc (PTS, buildd, popcon).

Reported by: offby1@blarg.net

Date: Fri, 24 Sep 1999 15:48:00 UTC

Severity: normal

Found in versions 2.1.2-3, 2.1.3-13

Done: Ben Collins <bcollins@debian.org>

Bug is archived. No further changes may be made.

Toggle useless messages

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to debian-bugs-dist@lists.debian.org, Debian GNU C Library Maintainers <debian-glibc@lists.debian.org>:
Bug#45912; Package libc6. (full text, mbox, link).


Acknowledgement sent to offby1@blarg.net:
New Bug report received and forwarded. Copy sent to Debian GNU C Library Maintainers <debian-glibc@lists.debian.org>. (full text, mbox, link).


Message #5 received at submit@bugs.debian.org (full text, mbox, reply):

From: offby1@blarg.net
To: submit@bugs.debian.org, offby1@blarg.net
Subject: `ping', and name resolution in general, hangs
Date: Fri, 24 Sep 1999 08:30:55 -0700 (PDT)
Package: libc6
Version: 2.1.2-3
Severity: critical

I'm not sure which package to report this against; libc6 is my best
guess.  Other relevant packages might be

Package: netbase
Version: 3.16-2

Package: kernel-image-2.2.9
Version: 2.2.9-2

  My network card driver is 3c59x:

    Sep 24 07:21:13 potato kernel: 3c59x.c:v0.99H 11/17/98 Donald Becker http://cesdis.gsfc.nasa.gov/linux/drivers/vortex.html
    Sep 24 07:21:13 potato kernel: eth0: 3Com 3Com Boomerang (unknown version) at 0xb800,  00:50:04:1b:f6:df, IRQ 11
    Sep 24 07:21:13 potato kernel:   8K byte-wide RAM 5:3 Rx:Tx split, autoselect/Autonegotiate interface.
    Sep 24 07:21:13 potato kernel:   MII transceiver found at address 24, status 182d.
    Sep 24 07:21:13 potato kernel:   Enabling bus-master transmits and whole-frame receives.

Here's the problem:

When I type `ping blarg.net' at a shell, `ping' hangs.  I expect it to display

	PING blarg.net (206.124.128.1): 56 data bytes
	64 bytes from 206.124.128.1: icmp_seq=0 ttl=62 time=25.7 ms
	...

Other name resolution also fails.  For example, Netscape hangs when
trying to visit web pages on machines other than mine.

I've never sat around and waited to see if `ping' eventually gets
unstuck; I've always given up and hit control-C after no more than
perhaps a minute.

In short, the entire network is completely unusable.

I'm using potato, which I installed by first installing slink from an
official CD-ROM, and then using `apt-get dist-upgrade' from 

	 http://http.us.debian.org/debian unstable main

This problem didn't always happen, although I don't remember exactly
when it started.  I know for certain that it didn't happen immediately
after I installed slink, nor did it happen immediately after I
upgraded to potato the first time.

I've also seen this problem on a different installation of slink (on
the same machine with the same hardware), but that problem
mysteriously went away, and I never reported it.

     * The exact and complete text of any error messages printed or
       logged. This is very important!

I haven't noticed any error messages -- certainly none at the shell on
which I ran `ping', and none in /var/log.

     * Exactly what you typed or did to demonstrate the problem.

As above.

     * A description of the incorrect behaviour: exactly what behaviour
       you were expecting, and what you observed. A transcript of an
       example session is a good way of showing this.

As above.

     * A suggested fix, or even a patch, if you have one.

Sorry!  But see the bizarre workaround involving `tcpdump', below.

     * Details of the configuration of the program with the problem.
       Include the complete text of its configuration files.

I don't think a particular program is at fault; if the problem is in
Debian at all, I assume it's in the resolver library, or perhaps in
the driver for the network card.  And I'm not aware of any
configuration files for either the library or the network card.

     * The versions of any packages on which the buggy package depends.

I don't believe libc6 or the net card driver depend on any package.

     * What kernel version you're using (type uname -a).

    Linux potato 2.2.9 #2 Fri Jun 4 23:14:38 EST 1999 i686 unknown

... but note that, as I explain above, I've seen the same problem on
slink, using kernel 2.0.36 with version 0.99E of the net card driver,
and libc6 version 2.0.7.

     * What shared C library you're using (type ls -l /lib/libc.so.6).

    bash-2.02$ ls -l /lib/libc.so.6
    lrwxrwxrwx   1 root     root           13 Sep 24 07:35 /lib/libc.so.6 -> libc-2.1.2.so

     * Any other details of your Linux system, if it seems appropriate.
       For example, if you had a problem with a Debian Perl script, you
       would want to provide the version of the `perl' binary (perl -v).

I connect to the Internet via DSL, using a Cisco 675 router, which is
a little grey box that sits on the floor.  I have a phone cord that
connects the router and my phone jack; I have an Ethernet cable that
connects the router and my network card.

The router is quite configurable, and perhaps its configuration is
relevant: 

* I've got it set to act as a DHCP server, although since I don't know
  how to make Debian use DHCP, I've told Debian to use a static IP
  address.  Since I only have one computer, there is no risk of having
  two IP addresses conflict.

* It's doing something called `network address translation', which, as
  I understand it, means that my machine "appears" to the outside
  world to have a different IP address than what the machine thinks.
  That is (as you can see below in my network configuration files), my
  machine thinks its IP address is 10.0.0.2, but the outside world
  uses 206.124.128.30 (that address might change from time to time,
  because the router might be a DHCP client of my ISP).  Also, if I
  were to connect other machines to the router (with an Ethernet hub),
  they would get IP addresses like 10.0.0.3, 10.0.0.4, etc.; but they
  would *all* appear to the outside world as 206.124.128.30.  It would
  appear that this would cause total confusion, but it doesn't;
  somehow this `network address translation' keeps things from getting
  confused.  I don't understand how it does this, but it seems to work
  OK.  (The place I work used to have a similar setup; they had five
  machines connected to the Internet, all "sharing" an outside IP
  address; the machines all worked fine.)  The one tradeoff that I
  know of is that nobody in the outside world can connect to any
  servers that I run, because the network address translation
  apparantly futzes with port numbers.  For example, my SMTP server
  listens on port 25, but someone who tries to connect to that port
  using my outside IP address 206.124.128.30 won't be able to.
  Presumably, if they could guess the port to which the router has
  "mapped" port 25, they could connect to that port.

  There may be some more information about the configuration of this
  box that is relevant.  Please feel free to ask.

Perhaps some of the following network configuration files are
relevant:

/etc/resolv.conf:
    nameserver 206.124.128.1
    nameserver 206.124.128.3

/etc/hosts:
    127.0.0.1	localhost loopback
     10.0.0.1	cisco-router
     10.0.0.2	potato

/etc/init.d/network:
    #! /bin/sh
    ifconfig lo 127.0.0.1
    route add -net 127.0.0.0
    IPADDR=10.0.0.2
    NETMASK=255.255.255.0
    NETWORK=10.0.0.0
    BROADCAST=10.0.0.255
    GATEWAY=10.0.0.1
    ifconfig eth0 ${IPADDR} netmask ${NETMASK} broadcast ${BROADCAST}
    route add -net ${NETWORK}
    [ "${GATEWAY}" ] && route add default gw ${GATEWAY} metric 1

     * Appropriate details of the hardware in your system. If you're
       reporting a problem with a device driver please list all the
       hardware in your system, as problems are often caused by IRQ and
       I/O address conflicts.

    bash-2.02$ cat /proc/devices 
    Character devices:
      1 mem
      2 pty
      3 ttyp
      4 ttyS
      5 cua
      7 vcs
     10 misc
     12 tpqic02
     29 fb

    Block devices:
      1 ramdisk
      2 fd
      3 ide0
      7 loop
      9 md
     22 ide1
     36 ed

Oddly, the problem goes away if I run `tcpdump': I do

       tcpdump &
       ping blarg.net

and `ping' responds correctly.  I can then kill `tcpdump', and until
the next time I boot, the network works fine.  It's as if `tcpdump'
changed something, and that change allows name resolution to work.

**

I'd be happy to help debug this, by perhaps making some program run
more verbosely, and then reporting the output; or perhaps installing a
special debugging version of something, and trying it out.  Let me
know.


Information forwarded to debian-bugs-dist@lists.debian.org, Debian GNU C Library Maintainers <debian-glibc@lists.debian.org>:
Bug#45912; Package libc6. (full text, mbox, link).


Acknowledgement sent to Joost Kooij <joost@pc47.mpn.cp.philips.com>:
Extra info received and forwarded to list. Copy sent to Debian GNU C Library Maintainers <debian-glibc@lists.debian.org>. (full text, mbox, link).


Message #10 received at 45912@bugs.debian.org (full text, mbox, reply):

From: Joost Kooij <joost@pc47.mpn.cp.philips.com>
To: offby1@blarg.net, 45912@bugs.debian.org
Subject: Re: Bug#45912: `ping', and name resolution in general, hangs
Date: Fri, 24 Sep 1999 18:57:13 +0200 (CEST)
Hi,

On Fri, 24 Sep 1999 offby1@blarg.net wrote:

> Package: libc6
> Version: 2.1.2-3
> Severity: critical

Surely, as you say you don't know for sure that this bug is in libc6, you
cannot mark it as "critical"?  The actual impact of the bug doesn't seem
to warrant a "critical" severity anyways (have you actually read the bug
submitting guidelines?)

I'm sorry to say, but from what I'm reading in your report, it seems just
very likely that you have a problem setting up your machine or network.
 
> Here's the problem:
> 
> When I type `ping blarg.net' at a shell, `ping' hangs.  I expect it to display
> 
> 	PING blarg.net (206.124.128.1): 56 data bytes
> 	64 bytes from 206.124.128.1: icmp_seq=0 ttl=62 time=25.7 ms
> 	...

What happens if you `ping 206.124.128.1'? Does that work? If so, your
network card and kernel are very likely to be fine.  If so, the problem is
restricted to name resolving and although libc6 plays some role in that
arena, it's very much unlikely that it is the culprit here, as it would be
a quite blatant failure in libc which numerous others would have reported
here already.

> Other name resolution also fails.  For example, Netscape hangs when
> trying to visit web pages on machines other than mine.

That would be explained pretty nicely by the hypothesis of a not
working resolver setup.
 
> I've never sat around and waited to see if `ping' eventually gets
> unstuck; I've always given up and hit control-C after no more than
> perhaps a minute.

You should try waiting 4 minutes, as the default name lookup timeout is 3
minutes.  The results would probably be somewhat informative.

> This problem didn't always happen, although I don't remember exactly
> when it started.  I know for certain that it didn't happen immediately
> after I installed slink, nor did it happen immediately after I
> upgraded to potato the first time.

My guess is that it happened when your router started having ideas of its
own about how your network is setup.

>      * The exact and complete text of any error messages printed or
>        logged. This is very important!
> 
> I haven't noticed any error messages -- certainly none at the shell on
> which I ran `ping', and none in /var/log.

Hmm, I tend to say that you could have done better here, like run some
more diagnostics and present the output/results of that, maybe even
hypothesize a bit yourself and provide some discussion material pro/contra
the various options, based on some ecxperimental data.

>      * Details of the configuration of the program with the problem.
>        Include the complete text of its configuration files.
> 
> I don't think a particular program is at fault; if the problem is in
> Debian at all, I assume it's in the resolver library, or perhaps in
> the driver for the network card.  And I'm not aware of any
> configuration files for either the library or the network card.

That's not much ground for submitting a "critical" bug, eh?

On a personal note, IMHO you really should have asked first on
debian-user.  Oh, but you already did, didn't you?

>      * Any other details of your Linux system, if it seems appropriate.
>        For example, if you had a problem with a Debian Perl script, you
>        would want to provide the version of the `perl' binary (perl -v).
> 
> I connect to the Internet via DSL, using a Cisco 675 router, which is
> a little grey box that sits on the floor.  I have a phone cord that
> connects the router and my phone jack; I have an Ethernet cable that
> connects the router and my network card.
> 
> The router is quite configurable, and perhaps its configuration is
> relevant: 
> 
> * I've got it set to act as a DHCP server, although since I don't know
>   how to make Debian use DHCP, I've told Debian to use a static IP
>   address.  Since I only have one computer, there is no risk of having
>   two IP addresses conflict.
> 
> * It's doing something called `network address translation', which, as
>   I understand it, means that my machine "appears" to the outside
>   world to have a different IP address than what the machine thinks.
>   That is (as you can see below in my network configuration files), my
>   machine thinks its IP address is 10.0.0.2, but the outside world
>   uses 206.124.128.30 (that address might change from time to time,
>   because the router might be a DHCP client of my ISP).  Also, if I
>   were to connect other machines to the router (with an Ethernet hub),
>   they would get IP addresses like 10.0.0.3, 10.0.0.4, etc.; but they
>   would *all* appear to the outside world as 206.124.128.30.  It would
>   appear that this would cause total confusion, but it doesn't;
>   somehow this `network address translation' keeps things from getting
>   confused.  I don't understand how it does this, but it seems to work
>   OK.  (The place I work used to have a similar setup; they had five
>   machines connected to the Internet, all "sharing" an outside IP
>   address; the machines all worked fine.)  The one tradeoff that I
>   know of is that nobody in the outside world can connect to any
>   servers that I run, because the network address translation
>   apparantly futzes with port numbers.  For example, my SMTP server
>   listens on port 25, but someone who tries to connect to that port
>   using my outside IP address 206.124.128.30 won't be able to.
>   Presumably, if they could guess the port to which the router has
>   "mapped" port 25, they could connect to that port.

I think you really should read some documentation, my friend.  I propose
to you the following (among others) excellent materials: the Net-3-HOWTO,
the IP-Masquerading-HOWTO, the Linux Network Administrator's Guide (both
available as a free download and as an O'Reilly Book.)  And did I mention
yet the debian-user mailinglist, also an excellent source of peer
information, inside knowledge and pointers to vast areas of useful
documentation?
 
> Oddly, the problem goes away if I run `tcpdump': I do
> 
>        tcpdump &
>        ping blarg.net

Tcpdump places your network adapter in promiscuous mode.  That means it
will pickup and to some amount also process network packets that it
officially shouldn't.  

Looking into my crystal ball, my guess is that this magically and
mysteriously causes the Cisco and your linux box to "see" each other.

> and `ping' responds correctly.  I can then kill `tcpdump', and until
> the next time I boot, the network works fine.  It's as if `tcpdump'
> changed something, and that change allows name resolution to work.

Try running "arp -a" before and after that tcpdump trick.  Also, play a
bit with "netstat", "ifconfig" and "route".  If it doesn't help
immediately, it's still fun to play with anyway and you'll learn a lot
while doing so.

Try similar diagnostics on the Cisco if you can.  Read all the
documentation about the Cisco.  It sounds like you'll need to know a bit
how it works and how to make it jive with your linux machine in order for 
your network to work correctly.

Cheers,


Joost




Information forwarded to debian-bugs-dist@lists.debian.org, Debian GNU C Library Maintainers <debian-glibc@lists.debian.org>:
Bug#45912; Package libc6. (full text, mbox, link).


Acknowledgement sent to "Kingsley G. Morse Jr." <change@nas.com>:
Extra info received and forwarded to list. Copy sent to Debian GNU C Library Maintainers <debian-glibc@lists.debian.org>. (full text, mbox, link).


Message #15 received at 45912@bugs.debian.org (full text, mbox, reply):

From: "Kingsley G. Morse Jr." <change@nas.com>
To: offby1@blarg.net, 45912@bugs.debian.org
Subject: Re: Bug 45912: ping, and name resolution in general, hangs
Date: Thu, 7 Oct 1999 21:05:02 -0700
I fixed a similar bug after upgrading to 2.2.12. Like you, I was using a
bogus IP address internally and a real IP address for the rest of the web.

In my case, the breakthrough was using tcpdump to diagnose the problem. I
think tcpdump may well help you too. For example, you could first run
tcpdump and then ping. tcpdump should then show you the packets that ping
is sending out. 

In my case, the new routing code in the 2.2.* kernel was causing the bogus
internal IP address to leak out to the rest of the net as the source
address of my outgoing packets. Therefore, no packets came back to me! I
fixed it by removing a netmask in my net (diald) configuration, but I
suggest that you run tcpdump first to diagnose your particular problem.

Thanks,
Kingsley



Severity set to `normal'. Request was from Joel Klecker <jk@espy.org> to control@bugs.debian.org. (full text, mbox, link).


Information forwarded to debian-bugs-dist@lists.debian.org, Joel Klecker <debian-glibc@lists.debian.org>:
Bug#45912; Package libc6. (full text, mbox, link).


Acknowledgement sent to Tuomas Heino <iheino@cc.hut.fi>:
Extra info received and forwarded to list. Copy sent to Joel Klecker <debian-glibc@lists.debian.org>. (full text, mbox, link).


Message #22 received at 45912@bugs.debian.org (full text, mbox, reply):

From: Tuomas Heino <iheino@cc.hut.fi>
To: offby1@blarg.net
Cc: 45912@bugs.debian.org
Subject: Re: `ping', and name resolution in general, hangs
Date: Wed, 17 Nov 1999 18:07:55 +0200 (EET)
Just a random guess... were/are you using nscd? it has had several bugs
that could cause that kind of behaviour...
Use the following ones to check whether its currently installed/in use:
$ dpkg -s nscd | grep ^Status
$ ps auwx | grep nscd



Information forwarded to debian-bugs-dist@lists.debian.org, Ben Collins <bcollins@debian.org>:
Bug#45912; Package libc6. (full text, mbox, link).


Acknowledgement sent to Eric Hanchrow <offby1@blarg.net>:
Extra info received and forwarded to list. Copy sent to Ben Collins <bcollins@debian.org>. (full text, mbox, link).


Message #27 received at 45912@bugs.debian.org (full text, mbox, reply):

From: Eric Hanchrow <offby1@blarg.net>
To: Debian Bug Tracking System <45912@bugs.debian.org>
Subject: problem appears to be my Cisco 675; here's a workaround
Date: Wed, 04 Oct 2000 16:18:16 -0700
Package: libc6
Version: 2.1.3-13

Someone, I don't remember who told, me to try this workaround.  It's
worked flawlessly for over a year (sorry I didn't think to report it
earlier).

I simply put these lines into /etc/init.d/network (on a potato system,
there might be a more appropriate location for them)

    # Work around a possible bug in the Cisco router
    if [ -f /proc/sys/net/ipv4/ip_local_port_range ]; then 
      echo 1025 4999 >/proc/sys/net/ipv4/ip_local_port_range
    fi

The theory is the the router somehow mangles connections whose source
port is exactly 1024; this workaround prevents such connections from
being made.

-- System Information
Debian Release: 2.2
Architecture: i386
Kernel: Linux offby1 2.2.17 #1 Sun Jun 25 09:24:41 EST 2000 i686

Versions of packages libc6 depends on:
ii  ldso                          1.9.11-9   The Linux dynamic linker, library 




Reply sent to Ben Collins <bcollins@debian.org>:
You have taken responsibility. (full text, mbox, link).


Notification sent to offby1@blarg.net:
Bug acknowledged by developer. (full text, mbox, link).


Message #32 received at 45912-done@bugs.debian.org (full text, mbox, reply):

From: Ben Collins <bcollins@debian.org>
To: 45912-done@bugs.debian.org
Subject: reporter claims it isn't a bug in libc
Date: Mon, 16 Oct 2000 10:19:08 -0400
-- 
 -----------=======-=-======-=========-----------=====------------=-=------
/  Ben Collins  --  ...on that fantastic voyage...  --  Debian GNU/Linux   \
`  bcollins@debian.org  --  bcollins@openldap.org  --  bcollins@linux.com  '
 `---=========------=======-------------=-=-----=-===-======-------=--=---'



Send a report that this bug log contains spam.


Debian bug tracking system administrator <owner@bugs.debian.org>. Last modified: Sun May 5 14:53:24 2024; Machine Name: buxtehude

Debian Bug tracking system

Debbugs is free software and licensed under the terms of the GNU Public License version 2. The current version can be obtained from https://bugs.debian.org/debbugs-source/.

Copyright © 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson, 2005-2017 Don Armstrong, and many other contributors.