[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: ext2fs Corruption Comments



> Could some of the changes in the new linux device drivers have
> introduced this same race condition?

What device driver changes are you talking about?  Unless you are using
OSKit-Mach instead of GNUmach, nothing about the device drivers has changed
much at all in a long time.

I suspect that you are in fact referring to changes in the Hurd ext2fs
filesystem server.  In the Hurd, device drivers and filesystems are very
separate; in fact, the device drivers are in the Mach kernel, and the
filesystems are individual Hurd server programs.

> I realize that we don't use ioctl's in the servers, but perhaps they get
> called in some way through the C library when running standard Linux
> utilities, which results in the rpc analog of this ioctl being invoked.
> For example, the "pokel_sync" routine of the ext2fs server seems like a
> possible candidate.

There is nothing like any of the Linux BLK* ioctls in the Hurd.  

Moreover, the fundamental design and implementation of things like
synchronization issues in filesystems are vastly different in the Hurd than
in the Linux kernel.  It is certainly the case that there might be a bug of
a broadly similar character in the Hurd, but for a thing like this a Linux
implementation detail is wholly unrelated to how the Hurd does things.

So, your intuitions about this are right in part and also dead wrong.  It
is the case that some sort of synchronization race issue is surely at the
heart of the Hurd's ext2fs corruption problems.  It is also right on that
pokel_sync is an integral part of how synchronization is handled in the
ext2fs server, and that it (or more likely its callers) are in the tangle
that produces the filesystem corruption.  But to draw a direct parallel
between some particular bug fix in the Linux buffer cache infrastructure
and what is going on in the Hurd, is straight out of left field.

To come equally from left field vis a vis this message, if we suspect a
problem in the pokel weirdness (and well we might), then a useful data
point to get would be to make an ext2fs with a blocksize of 4096 and see if
the filesystem corruption bug can be reproduced on that partition.

> I'm attaching the final "successful" patch for your review, in the
> hope that it will be helpful.

I may never comprehend why people find it necessary to use uuencode or
base64-encoded attachments to send a few hundred lines of text.  The
uuencoding is nearly twice the size of the actual patch, and I have to go
out of my way to read it.  I did go out of my way and read it, just to make
sure it was the sort of thing I thought it was.  This is a whole layer of
implementation abstraction where Linux and the Hurd share just about
nothing at all.


Reply to: