Debian Bug report logs - #59829
libc6: [PATCH] fnmatch() behaves oddly with *s and FNM_LEADING_DIR

Package: libc6; Maintainer for libc6 is GNU Libc Maintainers <debian-glibc@lists.debian.org>; Source for libc6 is src:glibc (PTS, buildd, popcon).

Reported by: Ryan Tracey <ryant@thawte.com>

Date: Tue, 7 Mar 2000 12:18:01 UTC

Severity: fixed

Done: Ben Collins <bcollins@debian.org>

Bug is archived. No further changes may be made.

Forwarded to libc-alpha@sourceware.cygnus.com

Toggle useless messages

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to debian-bugs-dist@lists.debian.org, Bdale Garbee <bdale@gag.com>:
Bug#59829; Package tar. (full text, mbox, link).


Acknowledgement sent to Ryan Tracey <ryant@thawte.com>:
New Bug report received and forwarded. Copy sent to Bdale Garbee <bdale@gag.com>. (full text, mbox, link).


Message #5 received at submit@bugs.debian.org (full text, mbox, reply):

From: Ryan Tracey <ryant@thawte.com>
To: submit@bugs.debian.org
Subject: tar 1.13.17 "--exclude" problems
Date: Tue, 07 Mar 2000 12:15:56 +0000
Package: tar
Version: 1.13.17-1

Tar no longer excludes the files and directories that previous versions
used to exclude (sorry, I have no idea with which version the change
occurred, but it was within the past month or so). For example, to
exclude all the MS Frontpage extensions files and directories in the
'webspace' directory tree, I used to do this:

	tar xcf /var/tmp/website.tgz --exclude=_\* webspace\

This used to exclude all the _vti_cnf/ and _private/ directories that
don't need to be on the main website.

I understand that there have been some changes in the way that tar
handles the exclude patterns, but there doesn't seem to be any pattern
that matches any directory that begins with the underscore character.
I've tried:

--exclude '_*'
--exclude '/_*'
etc..

This kind of file always ends up in the archive:
"webspace/synthesis/_vti_cnf/contents.html"
 
I am using Debian 2.2 with kernel 2.2.13 and libc 2.1.3-2.

Any help would be appreciated.

Thanks and regards,
Ryan


Information forwarded to debian-bugs-dist@lists.debian.org, Bdale Garbee <bdale@gag.com>:
Bug#59829; Package tar. (full text, mbox, link).


Acknowledgement sent to Colin Watson <cjw44@flatline.org.uk>:
Extra info received and forwarded to list. Copy sent to Bdale Garbee <bdale@gag.com>. (full text, mbox, link).


Message #10 received at 59829@bugs.debian.org (full text, mbox, reply):

From: Colin Watson <cjw44@flatline.org.uk>
To: 59829@bugs.debian.org
Cc: control@bugs.debian.org, 59829-submitter@bugs.debian.org, Matthew Thompson <mattyt@oz.net>
Subject: libc6: [PATCH] fnmatch() behaves oddly with *s and FNM_LEADING_DIR
Date: Thu, 22 Jun 2000 23:46:41 +0100
reassign 59829 libc6
thanks

[Sorry for the length of this mail. Wildcards are always fiddly to
analyse.]

Ryan Tracey <ryant@thawte.com> wrote:
> Tar no longer excludes the files and directories that previous versions
> used to exclude (sorry, I have no idea with which version the change
> occurred, but it was within the past month or so). For example, to
> exclude all the MS Frontpage extensions files and directories in the
> 'webspace' directory tree, I used to do this:
> 
>         tar xcf /var/tmp/website.tgz --exclude=_\* webspace\
> 
> This used to exclude all the _vti_cnf/ and _private/ directories that
> don't need to be on the main website.

tar checks whether a name is excluded by using the libc function
fnmatch() with FNM_FILE_NAME and FNM_LEADING_DIR. With these flags, a
pattern like "_*" matches a string that contains something matching "_*"
and containing no slashes, followed by a string containing exactly one
slash: that is, the pattern is matched against everything but the final
component of the file name and the preceding slash. "*" will match
"foo/bar", but not "foo" or "foo/bar/baz", using these flags. This
causes tar a good deal of confusion.

There are two other places in names.c where FNM_LEADING_DIR is used;
however, they don't use FNM_FILE_NAME, and here the behaviour of
fnmatch() is different, even in the absence of wildcards. The pattern
"foo" matches "foo", "foo/bar", and "foo/bar/baz", as does the pattern
"*".

Have a look at the output of the following program (0 indicates a
successful match, 1 indicates a failure):

===== cut here =====
#include <fnmatch.h>
#include <stdio.h>

int main()
{
    printf("%d %d %d\n",
	    fnmatch("x", "x", FNM_FILE_NAME | FNM_LEADING_DIR),
	    fnmatch("x", "x/y", FNM_FILE_NAME | FNM_LEADING_DIR),
	    fnmatch("x", "x/y/z", FNM_FILE_NAME | FNM_LEADING_DIR));
    printf("%d %d %d\n",
	    fnmatch("*", "x", FNM_FILE_NAME | FNM_LEADING_DIR),
	    fnmatch("*", "x/y", FNM_FILE_NAME | FNM_LEADING_DIR),
	    fnmatch("*", "x/y/z", FNM_FILE_NAME | FNM_LEADING_DIR));
    printf("%d %d %d\n",
	    fnmatch("*x", "x", FNM_FILE_NAME | FNM_LEADING_DIR),
	    fnmatch("*x", "x/y", FNM_FILE_NAME | FNM_LEADING_DIR),
	    fnmatch("*x", "x/y/z", FNM_FILE_NAME | FNM_LEADING_DIR));
    printf("%d %d %d\n",
	    fnmatch("x*", "x", FNM_FILE_NAME | FNM_LEADING_DIR),
	    fnmatch("x*", "x/y", FNM_FILE_NAME | FNM_LEADING_DIR),
	    fnmatch("x*", "x/y/z", FNM_FILE_NAME | FNM_LEADING_DIR));
}
===== cut here =====

(Incidentally, if you put the two #includes the other way round, you
get:

[cjw44@riva ~/src/fnmatch-bug]$ gcc -c -g test.c
test.c: In function `main':
test.c:7: `FNM_LEADING_DIR' undeclared (first use in this function)
test.c:7: (Each undeclared identifier is reported only once
test.c:7: for each function it appears in.)
test.c:23: `FNM_FILE_NAME' undeclared (first use in this function)

This has got to be a bug too.)

This program outputs:

0 0 0
1 0 1
0 0 0
1 0 1

Thus, final *s will only allow one trailing slash and file name
component in the presence of FNM_FILE_NAME and FNM_LEADING_DIR, whereas
other atoms in patterns don't care how many there are. If you remove the
FNM_FILE_NAME flag so that *s can match slashes, then all these matches
succeed, as you might expect.

If this (strange, I think) behaviour really is by design and/or POSIX,
then it needs to be documented. If not, the following patch fixes it:

--- glibc-2.1.3.orig/posix/fnmatch.c	Thu Jun 22 23:17:14 2000
+++ glibc-2.1.3/posix/fnmatch.c	Thu Jun 22 23:24:41 2000
@@ -204,25 +204,18 @@
 	  if (c == '\0')
 	    /* The wildcard(s) is/are the last element of the pattern.
 	       If the name is a file name and contains another slash
-	       this does mean it cannot match.  If the FNM_LEADING_DIR
-	       flag is set and exactly one slash is following, we have
-	       a match.  */
+	       this means it cannot match, unless the FNM_LEADING_DIR
+	       flag is set.  */
 	    {
 	      int result = (flags & FNM_FILE_NAME) == 0 ? 0 : FNM_NOMATCH;
 
 	      if (flags & FNM_FILE_NAME)
 		{
-		  const char *slashp = strchr (n, '/');
-
 		  if (flags & FNM_LEADING_DIR)
-		    {
-		      if (slashp != NULL
-			  && strchr (slashp + 1, '/') == NULL)
-			result = 0;
-		    }
+		    result = 0;
 		  else
 		    {
-		      if (slashp == NULL)
+		      if (strchr (n, '/') == NULL)
 			result = 0;
 		    }
 		}

Thanks,

-- 
Colin Watson                                     [cjw44@flatline.org.uk]



Bug reassigned from package `tar' to `libc6'. Request was from Colin Watson <cjw44@flatline.org.uk> to control@bugs.debian.org. (full text, mbox, link).


Message sent on to Ryan Tracey <ryant@thawte.com>:
Bug#59829. (full text, mbox, link).


Changed Bug title. Request was from Colin Watson <cjw44@flatline.org.uk> to control@bugs.debian.org. (full text, mbox, link).


Changed Bug title. Request was from Colin Watson <cjw44@flatline.org.uk> to control@bugs.debian.org. (full text, mbox, link).


Changed Bug title. Request was from Colin Watson <cjw44@flatline.org.uk> to control@bugs.debian.org. (full text, mbox, link).


Changed Bug title. Request was from Colin Watson <cjw44@flatline.org.uk> to control@bugs.debian.org. (full text, mbox, link).


Information forwarded to debian-bugs-dist@lists.debian.org, Ben Collins <bcollins@debian.org>:
Bug#59829; Package libc6. (full text, mbox, link).


Acknowledgement sent to cjw44@flatline.org.uk:
Extra info received and forwarded to list. Copy sent to Ben Collins <bcollins@debian.org>. (full text, mbox, link).


Message #28 received at 59829@bugs.debian.org (full text, mbox, reply):

From: cjw44@flatline.org.uk
To: libc-alpha@sourceware.cygnus.com
Cc: 59829@bugs.debian.org
Subject: fnmatch() behaves oddly with *s and FNM_LEADING_DIR
Date: Sun, 15 Oct 2000 16:13:39 +0100
>Submitter-Id:	net
>Originator:	Colin Watson
>Organization:  riva.ucam.org
>Confidential:	no
>Synopsis:	fnmatch() with FNM_LEADING_DIR matches * inconsistently
>Severity:	non-critical
>Priority:	low
>Category:	libc
>Class:		sw-bug
>Release:	libc-2.1.95
>Environment:
	
Host type: i386-pc-linux-gnu
System: Linux riva 2.4.0-test2 #1 Sun Jun 25 22:05:08 BST 2000 i686 unknown
Architecture: i686

Addons: linuxthreads

Build CC: gcc
Compiler version: 2.95.2 20000220 (Debian GNU/Linux)
Kernel headers: UTS_RELEASE
Symbol versioning: yes
Build static: yes
Build shared: yes
Build pic-default: no
Build profile: yes
Build omitfp: no
Build bounded: no
Build static-nss: no
Stdio: libio

>Description:

This bug was originally reported in Debian bug #59829,
<URL:http://bugs.debian.org/59829>. I'll repeat the problem description
here, with a little editing.

Ryan Tracey <ryant@thawte.com> wrote:
| Tar no longer excludes the files and directories that previous versions
| used to exclude (sorry, I have no idea with which version the change
| occurred, but it was within the past month or so). For example, to
| exclude all the MS Frontpage extensions files and directories in the
| 'webspace' directory tree, I used to do this:
| 
|         tar xcf /var/tmp/website.tgz --exclude=_\* webspace\
| 
| This used to exclude all the _vti_cnf/ and _private/ directories that
| don't need to be on the main website.

tar checks whether a name is excluded by using the libc function
fnmatch() with FNM_FILE_NAME and FNM_LEADING_DIR. With these flags, a
pattern like "_*" matches a string that contains something matching "_*"
and containing no slashes, followed by a string containing exactly one
slash: that is, the pattern is matched against everything but the final
component of the file name and the preceding slash. "*" will match
"foo/bar", but not "foo" or "foo/bar/baz", using these flags - despite
the fact that the pattern "foo" will match all three of these strings
using these flags. This causes tar a good deal of confusion.

There are two other places in names.c where FNM_LEADING_DIR is used;
however, they don't use FNM_FILE_NAME, and here the behaviour of
fnmatch() is different, even in the absence of wildcards. The pattern
"foo" matches "foo", "foo/bar", and "foo/bar/baz", as does the pattern
"*".

In other words, the behaviour of fnmatch() when both of the flags
FNM_FILE_NAME and FNM_LEADING_DIR are specified is counter-intuitive.
The documentation for FNM_LEADING_DIR says that it ignores "a trailing
sequence of characters starting with a `/' in STRING", and the fact that
"x/y/z" matches the pattern "x" seems to confirm that this is indeed "a
trailing sequence" rather than "the shortest trailing sequence".
However, "x/y/z" does not match the pattern "*", even though there is an
available leading directory containing no slashes that matches that
pattern.

>How-To-Repeat:

Have a look at the output of the following program (0 indicates a
successful match, 1 indicates a failure):

===== cut here =====
#include <fnmatch.h>
#include <stdio.h>

int main()
{
    printf("%d %d %d\n",
	    fnmatch("x", "x", FNM_FILE_NAME | FNM_LEADING_DIR),
	    fnmatch("x", "x/y", FNM_FILE_NAME | FNM_LEADING_DIR),
	    fnmatch("x", "x/y/z", FNM_FILE_NAME | FNM_LEADING_DIR));
    printf("%d %d %d\n",
	    fnmatch("*", "x", FNM_FILE_NAME | FNM_LEADING_DIR),
	    fnmatch("*", "x/y", FNM_FILE_NAME | FNM_LEADING_DIR),
	    fnmatch("*", "x/y/z", FNM_FILE_NAME | FNM_LEADING_DIR));
    printf("%d %d %d\n",
	    fnmatch("*x", "x", FNM_FILE_NAME | FNM_LEADING_DIR),
	    fnmatch("*x", "x/y", FNM_FILE_NAME | FNM_LEADING_DIR),
	    fnmatch("*x", "x/y/z", FNM_FILE_NAME | FNM_LEADING_DIR));
    printf("%d %d %d\n",
	    fnmatch("x*", "x", FNM_FILE_NAME | FNM_LEADING_DIR),
	    fnmatch("x*", "x/y", FNM_FILE_NAME | FNM_LEADING_DIR),
	    fnmatch("x*", "x/y/z", FNM_FILE_NAME | FNM_LEADING_DIR));
}
===== cut here =====

(Incidentally, if you put the two #includes the other way round, you
get:

[cjw44@riva ~/src/fnmatch-bug]$ gcc -c -g test.c
test.c: In function `main':
test.c:7: `FNM_LEADING_DIR' undeclared (first use in this function)
test.c:7: (Each undeclared identifier is reported only once
test.c:7: for each function it appears in.)
test.c:23: `FNM_FILE_NAME' undeclared (first use in this function)

This has got to be a bug too.)

This program outputs:

0 0 0
1 0 1
0 0 0
1 0 1

Thus, final *s will only allow one trailing slash and file name
component in the presence of FNM_FILE_NAME and FNM_LEADING_DIR, whereas
other atoms in patterns don't care how many there are. If you remove the
FNM_FILE_NAME flag so that *s can match slashes, then all these matches
succeed, as you might expect.

>Fix:

If this (strange, I think) behaviour really is by design and/or POSIX,
then it needs to be documented. If not, the following patch fixes it. It
has the effect that a wildcard at the end of a pattern with
FNM_FILE_NAME and FNM_LEADING_DIR trivially matches anything, as long as
everything before it matched. That is, it will munch up to the first
slash and then declare that it has found a matching leading directory.

--- glibc-2.1.95/posix/fnmatch_loop.c.orig	Mon Sep 25 16:23:06 2000
+++ glibc-2.1.95/posix/fnmatch_loop.c	Sun Oct 15 16:03:12 2000
@@ -99,25 +99,18 @@
 	  if (c == L('\0'))
 	    /* The wildcard(s) is/are the last element of the pattern.
 	       If the name is a file name and contains another slash
-	       this does mean it cannot match.  If the FNM_LEADING_DIR
-	       flag is set and exactly one slash is following, we have
-	       a match.  */
+	       this means it cannot match, unless the FNM_LEADING_DIR
+	       flag is set.  */
 	    {
 	      int result = (flags & FNM_FILE_NAME) == 0 ? 0 : FNM_NOMATCH;
 
 	      if (flags & FNM_FILE_NAME)
 		{
-		  const CHAR *slashp = STRCHR (n, L('/'));
-
 		  if (flags & FNM_LEADING_DIR)
-		    {
-		      if (slashp != NULL
-			  && STRCHR (slashp + 1, L('/')) == NULL)
-			result = 0;
-		    }
+		    result = 0;
 		  else
 		    {
-		      if (slashp == NULL)
+		      if (STRCHR (n, L('/')) == NULL)
 			result = 0;
 		    }
 		}




Noted your statement that Bug has been forwarded to libc-alpha@sourceware.cygnus.com. Request was from Colin Watson <cjw44@flatline.org.uk> to control@bugs.debian.org. (full text, mbox, link).


Information forwarded to debian-bugs-dist@lists.debian.org, Ben Collins <bcollins@debian.org>:
Bug#59829; Package libc6. (full text, mbox, link).


Acknowledgement sent to drepper@cygnus.com (Ulrich Drepper):
Extra info received and forwarded to list. Copy sent to Ben Collins <bcollins@debian.org>. (full text, mbox, link).


Message #35 received at 59829@bugs.debian.org (full text, mbox, reply):

From: Ulrich Drepper <drepper@redhat.com>
To: cjw44@flatline.org.uk
Cc: libc-alpha@sourceware.cygnus.com, 59829@bugs.debian.org
Subject: Re: fnmatch() behaves oddly with *s and FNM_LEADING_DIR
Date: 15 Oct 2000 10:29:42 -0700
cjw44@flatline.org.uk writes:

> >Synopsis:	fnmatch() with FNM_LEADING_DIR matches * inconsistently

I won't even look at this since FNM_LEADING_DIR is an internal flag.
It can only work in certain situations.  Users cannot use it for
themselves.

-- 
---------------.                          ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \    ,-------------------'   \  Sunnyvale, CA 94089 USA
Red Hat          `--' drepper at redhat.com   `------------------------



Information forwarded to debian-bugs-dist@lists.debian.org, Ben Collins <bcollins@debian.org>:
Bug#59829; Package libc6. (full text, mbox, link).


Acknowledgement sent to Paul Eggert <eggert@twinsun.com>:
Extra info received and forwarded to list. Copy sent to Ben Collins <bcollins@debian.org>. (full text, mbox, link).


Message #40 received at 59829@bugs.debian.org (full text, mbox, reply):

From: Paul Eggert <eggert@twinsun.com>
To: drepper@cygnus.com (Ulrich Drepper)
Cc: cjw44@flatline.org.uk, libc-alpha@sourceware.cygnus.com, 59829@bugs.debian.org
Subject: Re: fnmatch() behaves oddly with *s and FNM_LEADING_DIR
Date: 15 Oct 2000 11:12:55 -0700
Ulrich Drepper <drepper@redhat.com> writes:

> > >Synopsis:	fnmatch() with FNM_LEADING_DIR matches * inconsistently
> 
> I won't even look at this since FNM_LEADING_DIR is an internal flag.

The glibc 2.1.3 manual documents FNM_LEADING_DIR to behave the way
that GNU tar uses it.  Has it been withdrawn recently?  If so, why?
GNU tar has been using that flag ever since it was added to glibc in 1992.



Information forwarded to debian-bugs-dist@lists.debian.org, Ben Collins <bcollins@debian.org>:
Bug#59829; Package libc6. (full text, mbox, link).


Acknowledgement sent to drepper@cygnus.com (Ulrich Drepper):
Extra info received and forwarded to list. Copy sent to Ben Collins <bcollins@debian.org>. (full text, mbox, link).


Message #45 received at 59829@bugs.debian.org (full text, mbox, reply):

From: Ulrich Drepper <drepper@redhat.com>
To: Paul Eggert <eggert@twinsun.com>
Cc: cjw44@flatline.org.uk, libc-alpha@sourceware.cygnus.com, 59829@bugs.debian.org
Subject: Re: fnmatch() behaves oddly with *s and FNM_LEADING_DIR
Date: 15 Oct 2000 11:26:43 -0700
Paul Eggert <eggert@twinsun.com> writes:

> The glibc 2.1.3 manual documents FNM_LEADING_DIR to behave the way
> that GNU tar uses it.  Has it been withdrawn recently?  If so, why?
> GNU tar has been using that flag ever since it was added to glibc in 1992.

Tar is probably using it the way it is intended.  But look at the
implementation and you'll see that it handles only a few cases.
Making it general usable (which is not necessary in the first place
IMO) only slows down normal use.  Not changing anything is therefore
also in the best interest of tar.

-- 
---------------.                          ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \    ,-------------------'   \  Sunnyvale, CA 94089 USA
Red Hat          `--' drepper at redhat.com   `------------------------



Information forwarded to debian-bugs-dist@lists.debian.org, Ben Collins <bcollins@debian.org>:
Bug#59829; Package libc6. (full text, mbox, link).


Acknowledgement sent to Paul Eggert <eggert@twinsun.com>:
Extra info received and forwarded to list. Copy sent to Ben Collins <bcollins@debian.org>. (full text, mbox, link).


Message #50 received at 59829@bugs.debian.org (full text, mbox, reply):

From: Paul Eggert <eggert@twinsun.com>
To: drepper@cygnus.com (Ulrich Drepper)
Cc: cjw44@flatline.org.uk, libc-alpha@sourceware.cygnus.com, 59829@bugs.debian.org
Subject: Re: fnmatch() behaves oddly with *s and FNM_LEADING_DIR
Date: 15 Oct 2000 15:13:12 -0700
Ulrich Drepper <drepper@redhat.com> writes:

> Paul Eggert <eggert@twinsun.com> writes:
> > The glibc 2.1.3 manual documents FNM_LEADING_DIR to behave the way
> > that GNU tar uses it.  Has it been withdrawn recently?  If so, why?
> > GNU tar has been using that flag ever since it was added to glibc in 1992.
> 
> Tar is probably using it the way it is intended.  But look at the
> implementation and you'll see that it handles only a few cases.

Sorry, I don't follow.  From tar's point of view, it would be OK if
fnmatch misbehaved only in areas that tar doesn't exercise.  But
Debian bug 59829 came from someone who used tar, and found that tar
misbehaves because fnmatch misbehaves.

> Making it general usable (which is not necessary in the first place
> IMO) only slows down normal use.

Why isn't it necessary to make the feature work in general?



Information forwarded to debian-bugs-dist@lists.debian.org, Ben Collins <bcollins@debian.org>:
Bug#59829; Package libc6. (full text, mbox, link).


Acknowledgement sent to Paul Eggert <eggert@twinsun.com>:
Extra info received and forwarded to list. Copy sent to Ben Collins <bcollins@debian.org>. (full text, mbox, link).


Message #55 received at 59829@bugs.debian.org (full text, mbox, reply):

From: Paul Eggert <eggert@twinsun.com>
To: drepper@cygnus.com
Cc: cjw44@flatline.org.uk, 59829@bugs.debian.org
Subject: Re: fnmatch() behaves oddly with *s and FNM_LEADING_DIR
Date: Thu, 19 Oct 2000 17:15:33 -0700 (PDT)
   From: Ulrich Drepper <drepper@redhat.com>
   Date: 15 Oct 2000 15:28:23 -0700

   If there is a problem with tar in the way it uses fnmatch

Yes, that's what's happening.

   then I expect a report coming from the tar maintainer and not some
   random user.

At the end of this message is that bug report again, this time coming
the tar maintainer (i.e., from me :-).  The bug report contains a
self-contained test program that illustrates the problem.

Two bugs are being reported here.  First, the following code doesn't compile.
(This first bug is not important for tar, but it's clearly a bug.)

  #include <stdio.h>
  #include <fnmatch.h>

Second, FNM_LEADING_DIR mishandles "*" in some cases as shown in the
test program below.  The test program should output all zeros, but it
outputs 1 in some cases.  This is the bug that affects tar.

The bug report also contains a proposed patch, which I haven't verified.
(The proposed patch addresses only the second bug.)

I see that recent versions of other GNU utilities test for fnmatch
bugs like the FNM_LEADING_DIR bug, and use their own fnmatch if the C
library fnmatch doesn't work.  So it seems that other maintainers have
run into similar problems.  I plan to modify GNU tar so that it does a
similiar thing.  I don't like doing this, as it's double-maintenance
of what should be the same code, but it's better than the alternative
of GNU tar not working, and it's needed anyway to work around the
problem of bugs in the existing glibc installed basee.


Here's that bug report again:

----

From: cjw44@flatline.org.uk
To: libc-alpha@sourceware.cygnus.com
Cc: 59829@bugs.debian.org
Subject: fnmatch() behaves oddly with *s and FNM_LEADING_DIR
Date: Sun, 15 Oct 2000 16:13:39 +0100

>Submitter-Id:	net
>Originator:	Colin Watson
>Organization:  riva.ucam.org
>Confidential:	no
>Synopsis:	fnmatch() with FNM_LEADING_DIR matches * inconsistently
>Severity:	non-critical
>Priority:	low
>Category:	libc
>Class:		sw-bug
>Release:	libc-2.1.95
>Environment:
	
Host type: i386-pc-linux-gnu
System: Linux riva 2.4.0-test2 #1 Sun Jun 25 22:05:08 BST 2000 i686 unknown
Architecture: i686

Addons: linuxthreads

Build CC: gcc
Compiler version: 2.95.2 20000220 (Debian GNU/Linux)
Kernel headers: UTS_RELEASE
Symbol versioning: yes
Build static: yes
Build shared: yes
Build pic-default: no
Build profile: yes
Build omitfp: no
Build bounded: no
Build static-nss: no
Stdio: libio

>Description:

This bug was originally reported in Debian bug #59829,
<URL:http://bugs.debian.org/59829>. I'll repeat the problem description
here, with a little editing.

Ryan Tracey <ryant@thawte.com> wrote:
| Tar no longer excludes the files and directories that previous versions
| used to exclude (sorry, I have no idea with which version the change
| occurred, but it was within the past month or so). For example, to
| exclude all the MS Frontpage extensions files and directories in the
| 'webspace' directory tree, I used to do this:
| 
|         tar xcf /var/tmp/website.tgz --exclude=_\* webspace\
| 
| This used to exclude all the _vti_cnf/ and _private/ directories that
| don't need to be on the main website.

tar checks whether a name is excluded by using the libc function
fnmatch() with FNM_FILE_NAME and FNM_LEADING_DIR. With these flags, a
pattern like "_*" matches a string that contains something matching "_*"
and containing no slashes, followed by a string containing exactly one
slash: that is, the pattern is matched against everything but the final
component of the file name and the preceding slash. "*" will match
"foo/bar", but not "foo" or "foo/bar/baz", using these flags - despite
the fact that the pattern "foo" will match all three of these strings
using these flags. This causes tar a good deal of confusion.

There are two other places in names.c where FNM_LEADING_DIR is used;
however, they don't use FNM_FILE_NAME, and here the behaviour of
fnmatch() is different, even in the absence of wildcards. The pattern
"foo" matches "foo", "foo/bar", and "foo/bar/baz", as does the pattern
"*".

In other words, the behaviour of fnmatch() when both of the flags
FNM_FILE_NAME and FNM_LEADING_DIR are specified is counter-intuitive.
The documentation for FNM_LEADING_DIR says that it ignores "a trailing
sequence of characters starting with a `/' in STRING", and the fact that
"x/y/z" matches the pattern "x" seems to confirm that this is indeed "a
trailing sequence" rather than "the shortest trailing sequence".
However, "x/y/z" does not match the pattern "*", even though there is an
available leading directory containing no slashes that matches that
pattern.

>How-To-Repeat:

Have a look at the output of the following program (0 indicates a
successful match, 1 indicates a failure):

===== cut here =====
#include <fnmatch.h>
#include <stdio.h>

int main()
{
    printf("%d %d %d\n",
	    fnmatch("x", "x", FNM_FILE_NAME | FNM_LEADING_DIR),
	    fnmatch("x", "x/y", FNM_FILE_NAME | FNM_LEADING_DIR),
	    fnmatch("x", "x/y/z", FNM_FILE_NAME | FNM_LEADING_DIR));
    printf("%d %d %d\n",
	    fnmatch("*", "x", FNM_FILE_NAME | FNM_LEADING_DIR),
	    fnmatch("*", "x/y", FNM_FILE_NAME | FNM_LEADING_DIR),
	    fnmatch("*", "x/y/z", FNM_FILE_NAME | FNM_LEADING_DIR));
    printf("%d %d %d\n",
	    fnmatch("*x", "x", FNM_FILE_NAME | FNM_LEADING_DIR),
	    fnmatch("*x", "x/y", FNM_FILE_NAME | FNM_LEADING_DIR),
	    fnmatch("*x", "x/y/z", FNM_FILE_NAME | FNM_LEADING_DIR));
    printf("%d %d %d\n",
	    fnmatch("x*", "x", FNM_FILE_NAME | FNM_LEADING_DIR),
	    fnmatch("x*", "x/y", FNM_FILE_NAME | FNM_LEADING_DIR),
	    fnmatch("x*", "x/y/z", FNM_FILE_NAME | FNM_LEADING_DIR));
}
===== cut here =====

(Incidentally, if you put the two #includes the other way round, you
get:

[cjw44@riva ~/src/fnmatch-bug]$ gcc -c -g test.c
test.c: In function `main':
test.c:7: `FNM_LEADING_DIR' undeclared (first use in this function)
test.c:7: (Each undeclared identifier is reported only once
test.c:7: for each function it appears in.)
test.c:23: `FNM_FILE_NAME' undeclared (first use in this function)

This has got to be a bug too.)

This program outputs:

0 0 0
1 0 1
0 0 0
1 0 1

Thus, final *s will only allow one trailing slash and file name
component in the presence of FNM_FILE_NAME and FNM_LEADING_DIR, whereas
other atoms in patterns don't care how many there are. If you remove the
FNM_FILE_NAME flag so that *s can match slashes, then all these matches
succeed, as you might expect.

>Fix:

If this (strange, I think) behaviour really is by design and/or POSIX,
then it needs to be documented. If not, the following patch fixes it. It
has the effect that a wildcard at the end of a pattern with
FNM_FILE_NAME and FNM_LEADING_DIR trivially matches anything, as long as
everything before it matched. That is, it will munch up to the first
slash and then declare that it has found a matching leading directory.

--- glibc-2.1.95/posix/fnmatch_loop.c.orig	Mon Sep 25 16:23:06 2000
+++ glibc-2.1.95/posix/fnmatch_loop.c	Sun Oct 15 16:03:12 2000
@@ -99,25 +99,18 @@
 	  if (c == L('\0'))
 	    /* The wildcard(s) is/are the last element of the pattern.
 	       If the name is a file name and contains another slash
-	       this does mean it cannot match.  If the FNM_LEADING_DIR
-	       flag is set and exactly one slash is following, we have
-	       a match.  */
+	       this means it cannot match, unless the FNM_LEADING_DIR
+	       flag is set.  */
 	    {
 	      int result = (flags & FNM_FILE_NAME) == 0 ? 0 : FNM_NOMATCH;
 
 	      if (flags & FNM_FILE_NAME)
 		{
-		  const CHAR *slashp = STRCHR (n, L('/'));
-
 		  if (flags & FNM_LEADING_DIR)
-		    {
-		      if (slashp != NULL
-			  && STRCHR (slashp + 1, L('/')) == NULL)
-			result = 0;
-		    }
+		    result = 0;
 		  else
 		    {
-		      if (slashp == NULL)
+		      if (STRCHR (n, L('/')) == NULL)
 			result = 0;
 		    }
 		}




Severity set to `fixed'. Request was from Colin Watson <cjw44@flatline.org.uk> to control@bugs.debian.org. (full text, mbox, link).


Message sent on to Ryan Tracey <ryant@thawte.com>:
Bug#59829. (full text, mbox, link).


Message #60 received at 59829-submitter@bugs.debian.org (full text, mbox, reply):

From: Colin Watson <cjw44@flatline.org.uk>
To: control@bugs.debian.org, 59829-submitter@bugs.debian.org
Subject: Fixed in 2.1.96
Date: Thu, 14 Dec 2000 00:10:05 +0000
severity 59829 fixed
thanks

I believe this bug was fixed in Ulrich Drepper's checkin of 2000-10-21,
and thus in Debian libc6-2.1.96-1. Ryan, if you have a woody box, do you
still see this bug?

-- 
Colin Watson                                     [cjw44@flatline.org.uk]



Reply sent to Ben Collins <bcollins@debian.org>:
You have taken responsibility. (full text, mbox, link).


Notification sent to Ryan Tracey <ryant@thawte.com>:
Bug acknowledged by developer. (full text, mbox, link).


Message #65 received at 59829-done@bugs.debian.org (full text, mbox, reply):

From: Ben Collins <bcollins@debian.org>
To: 59829-done@bugs.debian.org
Cc: Colin Watson <cjw44@flatline.org.uk>
Subject: Re: Processed: Fixed in 2.1.96
Date: Wed, 13 Dec 2000 19:06:42 -0500
On Wed, Dec 13, 2000 at 06:03:27PM -0600, Debian Bug Tracking System wrote:
> Processing commands for control@bugs.debian.org:
> 
> > severity 59829 fixed
> Bug#59829: libc6: [PATCH] fnmatch() behaves oddly with *s and FNM_LEADING_DIR
> Severity set to `fixed'.

Thanks, closing.

-- 
 -----------=======-=-======-=========-----------=====------------=-=------
/  Ben Collins  --  ...on that fantastic voyage...  --  Debian GNU/Linux   \
`  bcollins@debian.org  --  bcollins@openldap.org  --  bcollins@linux.com  '
 `---=========------=======-------------=-=-----=-===-======-------=--=---'



Send a report that this bug log contains spam.


Debian bug tracking system administrator <owner@bugs.debian.org>. Last modified: Fri May 3 11:48:42 2024; Machine Name: buxtehude

Debian Bug tracking system

Debbugs is free software and licensed under the terms of the GNU Public License version 2. The current version can be obtained from https://bugs.debian.org/debbugs-source/.

Copyright © 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson, 2005-2017 Don Armstrong, and many other contributors.