[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

NFS + locks = no good... [ (Windowmaker|GNOME) related ]



[ relevant bg info: me = maintainer of wmaker and libproplist packages ]

Hi,

	a couple of days ago I upgraded my designated-slink-machine to
kernel 2.1.118 (which I *need* here), and I was surprised to find out that
WindowMaker stopped working (I think it was the kernel upgrade, but I did
upgrade some packages at the same time -- I can't reboot back in 111 or 103
at the moment). After some investigation, I found out that this was
libproplist (lpl) fault. Here's the revelant code:

  flk.l_type = F_RDLCK;
  flk.l_start = 0;
  flk.l_whence = SEEK_SET;
  flk.l_len = 0;
  if ((fcntl(fd, F_SETLK, &flk)<0))
    {
      close(fd);
      free(actual_filename);
      return NULL;
    }
  if(fstat(fd, &fstat_buf)<0)
    {
      close(fd);
      free(actual_filename);
      return NULL;
    }
  str = (char *)MyMalloc(__FILE__, __LINE__, sizeof(char)*(fstat_buf.st_size+32));
  if(read(fd, str, fstat_buf.st_size) != fstat_buf.st_size)
    {
      close(fd);
      MyFree(__FILE__, __LINE__, str);
      flk.l_type = F_UNLCK;
      fcntl(fd, F_SETLK, &flk);
      return NULL;
    }

  str[fstat_buf.st_size + 1] = '\0';
  flk.l_type = F_UNLCK;
  if (fcntl(fd, F_SETLK, &flk)<0)
    {
      close(fd);
      MyFree(__FILE__, __LINE__, str);
      fprintf(stderr, "PLGetPropListWithPath(): Couldn't unlock file!\n");
      return NULL;
    }

  close(fd);

As you can see this is pretty simple. Locks the file. Read the entire file.
Unlocks the file.

On a NFS filesystem with 2.0.x, there are no locks, and this fails silently,
i.e, this doesn't give an error but still reads the file. With 2.1.x (x ~>
70) this fails abruptly because the function returns NULL and the calling
program has no way of knowing there was a NFS locking problem which
eventually makes (made?) WindowMaker fail. I suspect other programs will
exhibit the same behaviour.

I made the error test a bit more loose by testing errno for ENOLCK or EINVAL
(according to libc(info) fnctl should return EINVAL on fs that don't support
locks; looking at the kernel source code this is in fact happening). For
some reason this stopped working with 2.1.118.

Now the gory "solution" (hold on to your seats): remove the locking code.
Yikes!

a) This function reads the files. (lpl)
b) The write function doesn't use locks. (lpl)
c) Locking is not mandatory by default (kernel)

Does a)+b)+c) make sense at all? If there are reading locks but no writing
locks and locking is not mandatory, is locking any good?

This is somewhat GNOME related because it seems like GNOME is going to start
using lpl for GNOME apps (I just wanted to point this out before someone
starts getting "app doesn't even start bugs")

Please bear in mind that the usual workarrounds for file locking over NFS
don't work here, because *users* have be the able to put a read lock on
*system files*

Is removing the locking code (taking into account all the previous stuff)
acceptable here?


					Marcelo


Reply to: