Re: Random idea:

To: debian-hurd@lists.debian.org
Subject: Re: Random idea:
From: Brian May <bam@debian.org>
Date: 31 Dec 1999 12:16:05 +1100
Message-id: <[🔎] 84yaabj4fe.fsf@snoopy.apana.org.au>
In-reply-to: dirk@gnumatic.de's message of "30 Dec 1999 11:28:17 +1100"
References: <[🔎] 06a701bf5210$47a95fa0$0e00000a@chen> <[🔎] 14442.30193.454091.23635@gnumatic.de>
>>>>> "Dirk" == Dirk Ritter <dirk@gnumatic.de> writes:

This is just my two cents:

    >> 2) Add mime types to the file information in the file
    >> system. This will make the file command trivial, and will allow
    >> some pretty neat things to be done (like writing a program that
    >> will act differently depends on the file type. "run $1" that
    >> will open gimp for graphics and vi for text...)

    Dirk> This of course seems like a nice idea, but it has some
    Dirk> pitfalls: - file systems would need to generate this
    Dirk> information "on the fly" (you should expect other OS's to
    Dirk> manipulate data, so you cannot rely on stored type

I think, for initial type identification, some sought of magic
database should be used. After the file type is identified,
then the problem is solved, as the mime type is saved (somehow)
to disk.

It should also be possible to change the identification and/or default
behavour of programs with ease.

Also, in debian-devel somebody pointed out, for instance that a .c
file could be both a .c file (to a compiler) and a text file (to an
editor). This means there are two possible actions that could be
associated with the file: open in text editor, and compile using C
compiler. (not to mention other possibilities, eg CVS, changelogs,
etc).  You wouldn't want a text editor (eg emacs) to automatically try
compiling the file, when you really want to edit it.

    Dirk> information - you would need to verify this at least during
    Dirk> the startup of the translator, so you better should do it on

Verification, if really required, would only need to be done
after the file type is first identified.

    Dirk> the fly - if at all) - types would need to be determined
    Dirk> from the data alone so that foo.txt would be clearly
    Dirk> identifiable as X-Bitmap if foo.txt's data is a valid
    Dirk> X-Bitmap - but what are you going to do about foo.txt.gz?
    Dirk> After all - you are still interested in the X-Bitmap and you
    Dirk> probably regard compression as something that should be
    Dirk> somewhat transparent, don't you.

What does MIME currently do for these file types? eg I think most web
browsers can cope with these files (from what I have read in
debian-devel at least).

As an example where MIME types stored in the filesystem would help:

Previously I received an E-Mail attachment[1], image/jpeg MIME type.  Its
file name was image.jpe (ie it was truncated). If I wanted to save the
file, it would save, by default as image.jpe, and programs wouldn't
know what file type that was meant to be (unless they use magic
algorithms). However, if the mail program saved the MIME type to disk,
too, then this problem wouldn't occur.

    Dirk> However, you might be interested in the following: - ESR
    Dirk> once wanted to work on a library that implements file type
    Dirk> identification - the state of this project is unknown to me,

I think that this would be a good start. If programs used this library
to identify the file type, then the administrator would be free to use
any method he/she desires, without having to change every application
program. For instance, there are numerous possibilities:

- file extension and some way to override (eg apache does this).

- magic file types (eg gimp does this). According to apache
documentation, this is slow.

- read mime type as stored on filesystem (nothing does this).

- others I may have missed.

I think this should be left up to the system administrator as
to what method is preferred.

    Dirk> Admittedly I once used OS/2. It offered extended attributes

Disclaimer: it is a long time since I last used OS/2...

The problem with OS/2 WPS is that it had two different methods for
file type identification. This makes it complicated, and it seems from
your message, you only realized one method.

Method 1:
- based on file extension. This is very similar to current versions
of windows. I think (not sure now) that:

  1. when you associated a file extension with an application, then
  all data files would show that application's icon.

  2. You could associate multiple applications for one file extension -
  this is something I really miss now.

  3. You could override the icon for any file you desired.

  4. Meant to be used for backward compatability only (I think).

Method 2:

- based on object orientated approach, using SOM (its name at the
time - I think it has a new name now).

- each file (and other objects) was identified as an object (similar
to an object in C++ programming, but not specific to any langauge).

- each object had a class, and classes have methods. These methods
are inherited from parent classes, and included operations like
"open file", or "get icon".

- Each object had a unique id (compare with inode number), and files
were associated with classes based on this unique ID (don't really
understand how this worked). I don't think this association had
anything to do with EAs, but might be mistaken. This unique ID was
also used to create links between files (IIRC).

This seemed a good idea at the time, however, had a number of
problems:

- every file had an associated object, but some objects didn't have an
associated file (eg configuration objects).

- some object orientated features weren't usable by non-SOM
applications, eg links between two files (can't remember what OS/2
called these now: shadows??). These links acted like hard links in
Unix, but I think they could cross filesystem boundaries (not sure
about that now). Another feature supported only by SOM objects: long
file names on FAT (not VFAT) were implemented via EAs.

- wasn't widely used except by the base OS/2 system. I suspect I lot
of people didn't understand it, and it required thinking about
appllications in a completely different (and non-portable) way.

- if one SOM object crashed, the entire WPS would crash, as everything
was run in the same address space (unless the object split itself up
into multiple processes). Distributed SOM was meant to fix this
problem, not sure if it ever was implemented. In thoery, the WPS
should automatically restart if it was crashed, but this didn't always
happen...

(side note: would a distributed version of WPS look anything like the
Hurd where translators can be attached to any file? Not sure if you
can inherit from parent translators though, but I think it should be
possible... Only difference I see is that the methods in Hurd are
lower level, and not "open file in editor", "view icon" or "print this
file", but rather more like "get this files contents" or "save this
files contents").

- very difficult, if not near impossible, for the user to change a
file from one class to another.

- if an object was associated with a high level class (eg application
class), then file type extensions were completely ignored, and
the application class took priority.

- the WPS configuration (as stored on disk) was very complicated (from
the users point of view) and seemed easy to break.

Still, I think it is a pity it wasn't implemented better, as it could
have been rather good.

    Dirk> on the native file system and an emulation of extended
    Dirk> attributes on other file systems where such Information got
    Dirk> stored in files '\wp root. sf' and '\ea data. sf' or
    Dirk> something similar. Cute as it looked - I don't want it back.

I think this was how EAs were stored on FAT. HPFS was better, and less
of a hack.

    Dirk> The Workplace shell used to store icons inside those
    Dirk> attributes, but I don't see how anyone would benefit from
    Dirk> that since this way you end up with lots of duplicated
    Dirk> bitmap data - a waste of space and resources and maybe it

I think this only occured if you changed the icons for the individual
files.

    Dirk> The remaining question is if applications should be provided
    Dirk> with means to store data related to a file inside resource
    Dirk> forks (Mac speak) or extended attributes (OS/2
    Dirk> terminology). I guess there is no real need for it - OS/2
    Dirk> EA's look like the environment with 'Key=VALUE' pairs and as
    Dirk> appealing as it might seem - I found it essentially useless,
    Dirk> especially since this did lead to unwanted side effects
    Dirk> (double clicking a template for instance could lock the
    Dirk> system considerably since the Workplace shell did not ignore
    Dirk> this action for templates, thereby rendering the template
    Dirk> useless - fortunately I knew how to back up and copy EA's,
    Dirk> so that I could spare a friend of mine at least four
    Dirk> re-installations of OS/2.).

This is a implementation issue, and a stupid design decision by IBM,
if you ask me. IBM probably thought people would want to edit template
files, but to most people this default action is confusing.  A better
default action would have been "create new file". However, opening a
template file should not normally crash the computer...

Please don't confuse these implementational issues. Just because
somebody else made a bad design decision, doesn't mean we need to
repeat the process.

The main problem I see with EAs, forks, etc, is every OS vendor has a
completely different (and incompatable) implementation (or no
implementation at all). This makes backups, for instance, specific to
the OS used. I think EXT3 might have its own version too, not sure,
but who ever is interested should have a look.

I think both the Apple and IBM approaches are very similar, but IBM
was optimized for smaller sets of data being stored.

Note:
[1] actually, I received the file on MS outlook, which got completely
confused at the unknown extension, despite the MIME type.

-- 
Brian May <bam@debian.org>
Reply to:
References:
- Random idea:
  - From: "chen" <chen@slangsoft.com>
- Random idea:
  - From: Dirk Ritter <dirk@gnumatic.de>
Prev by Date: Re: Random idea:
Next by Date: Re: Random idea:
Previous by thread: Re: Random idea:
Next by thread: Re: Random idea:
Index(es):
- Date
- Thread