@dircategory Kernel * Hurd: (hurd). Using and programming the Hurd kernel servers.
Copyright (C) 1994--1998 Free Software Foundation, Inc.
Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies.
Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided also that the entire resulting derived work is distributed under the terms of a permission notice identical to this one.
Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions.
The GNU Hurd(1) is the GNU Project's replacement for the Unix kernel. The Hurd is a collection of servers that run on the Mach microkernel to implement file systems, network protocols, file access control, and other features that are normally implemented by the Unix kernel or similar kernels (such as Linux).
This manual is designed to be useful to everybody who is interested in using, administering, or programming the Hurd.
If you are an end-user and you are looking for help on running the Hurd, the first few chapters of this manual describe the essential parts of installing, starting up, and shutting down a Hurd workstation. If you need help with a specific program, the best way to use this manual is to find the program's name in the index and go directly to the appropriate section. You may also wish to try running program --help, which will display a brief usage message for program (see section Foundations).
The rest of this manual is a technical discussion of the Hurd servers and their implementation, and would not be helpful until you want to learn how to modify the Hurd.
This manual is organized according to the subsystems of the Hurd, and each chapter begins with descriptions of utilities and servers that are related to that subsystem. If you are a system administrator, and you want to learn more about, say, the Hurd networking subsystem, you can skip to the networking chapter (see section Networking), and read about the related utilities and servers.
Programmers who are interested in learning how to modify Hurd servers, or write new ones, should begin by learning about a microkernel to which the Hurd has been ported (currently only GNU Mach) and reading section Foundations. You should then familiarize yourself with a subsystem that interests you by reading about existing servers and the libraries they use. At that point, you should be able to study the source code of existing Hurd servers and understand how they use the Hurd libraries.
The final level of mastery is learning the RPC(2) interfaces which the Hurd libraries implement. The last section of each chapter describes any Hurd interfaces used in that subsystem. Those sections assume that you are perusing the referenced interface definitions as you read. After you have understood a given interface, you will be in a good position to improve the Hurd libraries, design your own interfaces, and implement new subsystems.
The Hurd is not the most advanced operating system known to the planet (yet), but it does have a number of enticing features:
FIXME: overview of basic Hurd architecture, FAQish in nature
Richard Stallman (RMS) started GNU in 1983, as a project to create a complete free operating system. In the text of the GNU Manifesto, he mentioned that there is a primitive kernel. In the first GNUsletter, Feb. 1986, he says that GNU's kernel is TRIX, which was developed at the Massachusetts Institute of Technology.
By December of 1986, the Free Software Foundation (FSF) had "started working on the changes needed to TRIX" [Gnusletter, Jan. 1987]. Shortly thereafter, the FSF began "negotiating with Professor Rashid of Carnegie-Mellon University about working with them on the development of the Mach kernel" [Gnusletter, June, 1987]. The text implies that the FSF wanted to use someone else's work, rather than have to fix TRIX.
In [Gnusletter, Feb. 1988], RMS was talking about taking Mach and putting the Berkeley Sprite filesystem on top of it, "after the parts of Berkeley Unix... have been replaced."
Six months later, the FSF is saying that "if we can't get Mach, we'll use TRIX or Berkeley's Sprite." Here, they present Sprite as a full-kernel option, rather than just a filesystem.
In January, 1990, they say "we aren't doing any kernel work. It does not make sense for us to start a kernel project now, when we still hope to use Mach" [Gnusletter, Jan. 1990]. Nothing significant occurs until 1991, when a more detailed plan is announced:
``We are still interested in a multi-process kernel running on top of Mach. The CMU lawyers are currently deciding if they can release Mach with distribution conditions that will enable us to distribute it. If they decide to do so, then we will probably start work. CMU has available under the same terms as Mach a single-server partial Unix emulator named Poe; it is rather slow and provides minimal functionality. We would probably begin by extending Poe to provide full functionality. Later we hope to have a modular emulator divided into multiple processes.'' [Gnusletter, Jan. 1991].
RMS explains the relationship between the Hurd and Linux in http://www.gnu.org/software/hurd/hurd-and-linux.html, where he mentions that the FSF started developing the Hurd in 1990. As of [Gnusletter, Nov. 1991], the Hurd (running on Mach) is GNU's official kernel.
Version 2, June 1991
Copyright (C) 1989, 1991 Free Software Foundation, Inc. 59 Temple Place -- Suite 330, Boston, MA 02111-1307, USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed.
The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. This General Public License applies to most of the Free Software Foundation's software and to any other program whose authors commit to using it. (Some other Free Software Foundation software is covered by the GNU Library General Public License instead.) You can apply it to your programs, too.
When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs; and that you know you can do these things.
To protect your rights, we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights. These restrictions translate to certain responsibilities for you if you distribute copies of the software, or if you modify it.
For example, if you distribute copies of such a program, whether gratis or for a fee, you must give the recipients all the rights that you have. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights.
We protect your rights with two steps: (1) copyright the software, and (2) offer you this license which gives you legal permission to copy, distribute and/or modify the software.
Also, for each author's protection and ours, we want to make certain that everyone understands that there is no warranty for this free software. If the software is modified by someone else and passed on, we want its recipients to know that what they have is not the original, so that any problems introduced by others will not reflect on the original authors' reputations.
Finally, any free program is threatened constantly by software patents. We wish to avoid the danger that redistributors of a free program will individually obtain patent licenses, in effect making the program proprietary. To prevent this, we have made it clear that any patent must be licensed for everyone's free use or not licensed at all.
The precise terms and conditions for copying, distribution and modification follow.
NO WARRANTY
If you develop a new program, and you want it to be of the greatest possible use to the public, the best way to achieve this is to make it free software which everyone can redistribute and change under these terms.
To do so, attach the following notices to the program. It is safest to attach them to the start of each source file to most effectively convey the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found.
one line to give the program's name and an idea of what it does. Copyright (C) 19yy name of author This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
Also add information on how to contact you by electronic and paper mail.
If the program is interactive, make it output a short notice like this when it starts in an interactive mode:
Gnomovision version 69, Copyright (C) 19yy name of author Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. This is free software, and you are welcome to redistribute it under certain conditions; type `show c' for details.
The hypothetical commands `show w' and `show c' should show the appropriate parts of the General Public License. Of course, the commands you use may be called something other than `show w' and `show c'; they could even be mouse-clicks or menu items--whatever suits your program.
You should also get your employer (if you work as a programmer) or your school, if any, to sign a "copyright disclaimer" for the program, if necessary. Here is a sample; alter the names:
Yoyodyne, Inc., hereby disclaims all copyright interest in the program `Gnomovision' (which makes passes at compilers) written by James Hacker. signature of Ty Coon, 1 April 1989 Ty Coon, President of Vice
This General Public License does not permit incorporating your program into proprietary programs. If your program is a subroutine library, you may consider it more useful to permit linking proprietary applications with the library. If this is what you want to do, use the GNU Library General Public License instead of this License.
Before you can use the Hurd on your favorite machine, you'll need to install all of its software components. Currently, the Hurd only runs on Intel i386-compatible architectures (such as the Pentium), using the GNU Mach microkernel.
If you have unsupported hardware or a different microkernel, you will not be able to run the Hurd until all the required software has been ported to your architecture. Porting is an involved process which requires considerable programming skills, and is not recommended for the faint-of-heart. If you have the talent and desire to do a port, contact bug-hurd@gnu.org in order to coordinate the effort.
By far the easiest and best way to install the Hurd is to obtain a GNU binary distribution. Even if you plan on recompiling the Hurd itself, it is best to start off with an already-working GNU system so that you can avoid having to reboot every time you want to test a program.
You can get GNU from a friend under the conditions allowed by the GNU GPL (see section GNU General Public License). Please consider sending a donation to the Free Software Foundation so that we can continue to improve GNU software.
You can also FTP the complete GNU system from your closest GNU mirror, or ftp://ftp.gnu.org/pub/gnu/. The GNU binary distribution is available in a subdirectory called `gnu-n.m', where n.m is the version of the Hurd that this GNU release corresponds to (0.2 at the time of this writing). Again, please consider donating to the Free Software Foundation.
The format of the binary distribution is prone to change, so this manual does not describe the details of how to install GNU. The `README' file distributed with the binary distribution gives you complete instructions.
After you follow all the appropriate instructions, you will have a working GNU/Hurd system. If you have used Linux-based GNU systems or other Unix-like systems before, the Hurd will look quite familiar. You should play with it for a while, referring to this manual only when you want to learn more about the Hurd. Have fun!
If the Hurd is your first introduction to the GNU operating system, then you will need to learn more about GNU in order to be able to use it. You should talk to friends who are familiar with GNU, in order to find out about classes, online tutorials, or books which can help you learn more about GNU.
If you have no friends who are already using GNU, you can find some useful starting points at the GNU web site, http://www.gnu.org/. You can also send e-mail to help-hurd@gnu.org, to contact fellow Hurd users. You can join this mailing list by sending a request to help-hurd-request@gnu.org.
Another way to install the Hurd is to use an existing operating system in order to compile all the required Hurd components from source code. This is called cross-compiling, because it is done between two different platforms.
This process is not recommended unless you are porting the Hurd to a new platform. Cross-compiling the Hurd to a platform which already has a binary distribution is a tremendous waste of time... it is frequently necessary to repeat steps over and over again, and you are not even guaranteed to get a working system. Please, obtain a GNU binary distribution (see section Binary Distributions), and use your time to do more useful things. If you are capable of cross-compiling, then you are definitely skilled enough to make more useful (and creative) modifications to the GNU system.
To emphasize this point: downloading the entire GNU system over a 9600 baud modem takes much less time than cross-compilation, and provides better results, too.
If you are still sure that you would like to cross-compile the Hurd, you should send e-mail to the bug-hurd@gnu.org mailing list in order to coordinate your efforts. People on that list will give you advice on what to look out for, as well as helping you figure out a way that your cross-compilation can benefit Hurd development. After that, don your bug-resistant suit, and read the `INSTALL-cross' file, which comes with the latest Hurd source code distribution. The instructions in `INSTALL-cross' are usually out-of-date, but they contain some useful hints buried amongst the errors.
Bootstrapping(3) is the procedure by which your machine loads the microkernel and transfers control to the Hurd servers.
The bootloader is the first software that runs on your machine. Many hardware architectures have a very simple startup routine which reads a very simple bootloader from the beginning of the internal hard disk, then transfers control to it. Other architectures have startup routines which are able to understand more of the contents of the hard disk, and directly start a more advanced bootloader.
Currently, GRUB(4) is the preferred GNU bootloader. GRUB provides advanced functionality, and is capable of loading several different kernels (such as Linux, DOS, and the *BSD family).
From the standpoint of the Hurd, the bootloader is just a mechanism to
get the microkernel running and transfer control to serverboot.
You will need to refer to your bootloader and microkernel documentation
for more information about the details of this process.
The serverboot program is responsible for loading and executing
the rest of the Hurd servers. Rather than containing specific
instructions for starting the Hurd, it follows general steps given in a
user-supplied boot script.
To boot the Hurd, the microkernel must start serverboot as its
first task, and pass it appropriate arguments. serverboot may
also be invoked while the Hurd is already running, which allows users to
start their own complete sub-Hurds (see section Recursive Bootstrap).
serverboot
The serverboot program has the following synopsis:
serverboot -switch... [[host-port device-port] root-name]
Each switch is a single character, out of the following set:
All the switches are put into the ${boot-args} script
variable.
host-port and device-port are integers which represent the
microkernel host and device ports, respectively (and are used to
initialize the ${host-port} and ${device-port} boot
script variables). If these ports are not specified, then
serverboot assumes that the Hurd is already running, and fetches
the current ports from the procserver (FIXME xref).
root-name is the name of the microkernel device that should be
used as the Hurd bootstrap filesystem. serverboot uses this name
to locate the boot script (described above), and to initialize the
${root-device} script variable.
FIXME: finish
The most appealing use of the serverboot program is to start a
set of core Hurd servers while another Hurd is already running. You
will rarely need to do this, and it requires superuser privileges, but
it is interesting to note that it can be done.
Usually, you would make changes to only one server, and simply tell your programs to use it in order to test out your changes. This process can be applied even to the core servers. However, some changes have far-reaching effects, and so it is nice to be able to test those effects without having to reboot the machine.
Here are the steps you can follow to test out a new set of servers:
Note that it is impossible to share microkernel devices between the two
running Hurds, so don't get any funny ideas. When you're finished
testing your new Hurd, then you can run the halt or reboot
programs to return control to the parent Hurd.
If you're satisfied with your new Hurd, you can arrange for your bootloader to start it, and reboot your machine. Then, you'll be in a safe place to overwrite your old Hurd with the new one, and reboot back to your old configuration (with the new Hurd servers).
FIXME: finish
Every Hurd program accepts the following optional arguments:
The rest of this chapter provides a programmer's introduction to the Hurd. If you are not a programmer, then this chapter will not make much sense to you... you should consider skipping to descriptions of specific Hurd programs (see section Audience).
The Hurd distribution includes many libraries in order to provide a useful set of tools for writing Hurd utilities and servers. Several of these libraries are useful not only for the Hurd, but also for writing microkernel-based programs in general. These fundamental libraries are not difficult to understand, and they are a good starting point, because the rest of the Hurd relies upon them quite heavily.
All Hurd servers and libraries are aggressively multithreaded in order
to take full advantage of any multiprocessing capabilities provided by
the microkernel and the underlying hardware. The Hurd threads library,
libthreads, contains the default Hurd thread implementation, which
is declared in <cthreads.h>.
Currently (April 1998), the Hurd uses cthreads, which have already been documented thoroughly by CMU. Eventually, it will be migrated to use POSIX pthreads, which are documented in a lot of places.
Every single library in the Hurd distribution (including the GNU C library) is completely thread-safe, and the Hurd servers themselves are aggressively multithreaded.
A commonly asked question is whether the Hurd has been ported to the Open Group's version of the Mach microkernel. The answer is "no".
Currently (April 1998), the Hurd is quite dependent on the GNU Mach microkernel, which is a derivative of the University of Utah's Mach 4. However, the Hurd developers are all too aware of the limitations of Mach.
libmom is the first of several steps that need to be taken in
order to make the Hurd portable to other message-passing microkernels.
MOM stands for Microkernel Object Model, and is an
abstraction of the basic services provided by common message-passing
microkernels. It will provide the necessary insulation so that Hurd
servers and the C library can avoid making microkernel-dependent kernel
calls.
At the present, though, libmom is still evolving, and will take
some time to be fully incorporated into the Hurd.
Ports are communication channels that are held by the kernel.
A port has separate send rights and receive rights, which may be transferred from task to task via the kernel. Port rights are similar to Unix file descriptors: they are per-task integers which are used to identify ports when making kernel calls. Send rights are required in order to send an RPC request down a port, and receive rights are required to serve the RPC request. Receive rights may be aggregated into a single portset, which serve as useful organizational units.
In a single-threaded RPC client, managing and categorizing ports is not a difficult process. However, in a complex multithreaded server, it is useful to have a more abstract interface to managing portsets, as well as maintaining server metadata.
The Hurd ports library, libports, fills that need. The
libports functions are declared in <hurd/ports.h>.
The libports bucket is simply a port set, with some
metadata and a lock. All of the libports functions operate on
buckets.
A port class is a collection of individual ports, which can be manipulated conveniently, and have enforced deallocation routines. Buckets and classes are entirely orthogonal: there is no requirement that all the ports in a class be in the same bucket, nor is there a requirement that all the ports in a bucket be in the same class.
Once you have created at least one bucket and class, you may create new ports, and store them in those buckets. There are a few different functions for port creation, depending on your application's requirements:
ports_create_port, except don't actually put the port
into the portset underlying bucket. This is intended to be used
for cases where the port right must be given out before the port is
fully initialized; with this call you are guaranteed that no RPC service
will occur on the port until you have finished initializing it and
installed it into the portset yourself.
ports_create_port.
The following functions move port receive rights to and from the port structure:
ports_reallocate_port and
ports_reallocate_from_external may not be used.
ports_destroy_right,
except that the receive right itself is not affected. Note that in
multi-threaded servers, messages might already have been dequeued for
this port before it gets removed from the portset; such messages will
get EOPNOTSUPP errors.
ports_destroy_right were called) and topt's old right is
destroyed (as if ports_reallocate_from_external were called).
It is important to point out that the port argument to each of
the libports functions is a void * and not a struct
port_info *. This is done so that you may add arbitrary
meta-information to your libports-managed ports. Simply define
your own structure whose first element is a struct port_info, and
then you can use pointers to these structures as the port argument
to any libports function.
The following functions are useful for maintaining metadata that is stored in your own custom ports structure:
These functions maintain references to ports so that the port
information structures may be freed if and only if they are no longer
needed. It is your responsibility to tell libports when
references to ports change.
ports_count_class) to
continue.
ports_count_bucket) to
continue.
Weak references are not often used, as they are the same as hard references for port classes where dropweak_routine is null. See section Buckets and Classes.
The rest of the libports functions are dedicated to controlling
RPC operations. These functions help you do all the locking and thread
cancellations that are required in order to build robust servers.
EDIED; otherwise we return zero.
ports_begin_rpc.
ports_inhibit_port_rpcs, but affects all ports in
class.
ports_inhibit_port_rpcs, but affects all ports in
bucket.
ports_inhibit_port_rpcs, but affects all ports
whatsoever.
ports_inhibit_port_rpcs for this
port, allowing blocked RPCs to continue.
ports_inhibit_class_rpcs for
class.
ports_inhibit_bucket_rpcs for
bucket.
ports_inhibit_all_rpcs.
thread_cancel) any RPCs in progress on port.
ports_interrupt_rpcs, return nonzero and clear the interrupted
flag.
hurd_cancel to be called on rpc's thread if
object gets notified that any of the things in what have
happened to port. rpc should be an RPC on object.
hurd_cancel to be called on the current thread, which
should be an RPC on object, if port gets notified with the
condition what.
ports_interrupt_self_on_notification with
what set to MACH_NOTIFY_DEAD_NAME.
ports_interrupt_notified_rpcs with what set
to MACH_NOTIFY_DEAD_NAME.
libihash provides integer-keyed hash tables, for arbitrary
element data types. Such hash tables are frequently used when
implementing sparse arrays or buffer caches.
The following functions are declared in <hurd/ihash.h>:
ENOMEM is returned, otherwise zero.
void
**, and will be filled with a pointer that may be used as an argument
to ihash_locp_remove. The variable pointed to by locp may
be overwritten sometime between this call and when the element is
deleted, so you cannot stash its value elsewhere and hope to use the
stashed value with ihash_locp_remove. If a memory allocation
error occurs, ENOMEM is returned, otherwise zero.
ihash_iterate returns that value, otherwise it (eventually)
returns 0.
ihash_add. This call
should be faster than ihash_remove. ht can be null, in
which case the call still succeeds, but no cleanup is done.
The GNU C library is constantly developing to meet the needs of the Hurd. However, because the C library needs to be very stable, it is irresponsible to add new functions to it without carefully specifying their interface, and testing them thoroughly.
The Hurd distribution includes a library called
libshouldbeinlibc, which serves as a proving ground for additions
to the GNU C library. This library is in flux, as some functions are
added to it by the Hurd developers and others are moved to the official
C library.
These functions aren't currently documented (other than in their header files), but complete documentation will be added to The GNU C Library Reference Manual when these functions become part of the GNU C library.
libhurdbugaddr exists only to define a single variable:
argp_program_bug_address is the default Hurd bug-reporting e-mail
address, bug-hurd@gnu.org. This address is displayed to the
user when any of the standard Hurd servers and utilities are invoked
using the `--help' option.
There are no specific programs or servers associated with the I/O subsystem, since it is used to interact with almost all servers in the GNU Hurd. It provides facilities for reading and writing I/O channels, which are the underlying implementation of file and socket descriptors in the GNU C library.
The <hurd/iohelp.h> file declares several functions which are
useful for low-level I/O implementations. Most Hurd servers do not call
these functions directly, but they are used by several of the Hurd
filesystem and networking helper libraries. libiohelp requires
libthreads.
Most I/O servers need to implement some kind of user authentication
checking. In order to facilitate that process, libiohelp has
some functions which encapsulate a set of idvecs (FIXME: xref to C
library) in a single struct iouser.
I/O reauthentication is a rather complex protocol involving the
authserver as a trusted third party (see section Auth Protocol). In order
to reduce the risk of flawed implementations, I/O reauthentication is
encapsulated in the iohelp_reauth function:
If the transaction cannot be completed, return zero, unless permit_failure is nonzero. If permit_failure is nonzero, then should the transaction fail, return an iouser that has no ids. The new port to be sent to the user is newright.
The conch is at the heart of the shared memory I/O system.
Several Hurd libraries implement shared I/O, and so libiohelp
contains functions to facilitate conch management.
Everything about shared I/O is undocumented because it is not needed for adequate performance, and the RPC interface is simpler (see section I/O Interface). It is not useful for new libraries or servers to implement shared I/O.
The external pager (XP) microkernel interface allows applications to provide the backing store for a memory object, by converting hardware page faults into RPC requests. External pagers are required for memory-mapped I/O (see section Mapped Data) and stored filesystems (see section Stored Filesystems).
The external pager interface is quite complex, so the Hurd pager library
contains functions which aid in creating multithreaded external pagers.
libpager is declared in <hurd/pager.h>, and requires only
the threads and ports libraries.
The pager library defines the struct pager data type in order to
represent a multi-threaded pager. The general procedure for creating a
pager is to define the functions listed in section Pager Callbacks,
allocate a libports bucket for the ports which will access the
pager, and create at least one new struct pager with
pager_create.
libports, in bucket) and will be immediately ready to
receive requests. u_pager will be provided to later calls to
pager_find_address. The pager will have one user reference
created. may_cache and copy_strategy are the original
values of those attributes as for memory_object_ready. Users may
create references to pagers by use of the relevant ports library
functions. On errors, return null and set errno.
Once you are ready to turn over control to the pager library, you should
call ports_manage_port_operations_multithread on the
bucket, using pager_demuxer as the ports demuxer.
This will handle all external pager RPCs, invoking your pager callbacks
when necessary.
libports messages on pager ports.
The following functions are the body of the pager library, and provide a clean interface to pager functionality:
pager_sync writes all data; pager_sync_some only writes
data starting at start, for len bytes.
pager_flush flushes all data; pager_flush_some only
flushes data starting at start, for len bytes.
pager_return flushes and restores all data;
pager_return_some only flushes and restores data starting at
start, for len bytes.
attributes@deftypefun void pager_change_attributes (struct pager *pager, boolean_t may_cache, memory_object_copy_strategy_t copy_strategy, int wait)
Change the attributes of the memory object underlying pager pager.
The may_cache and copy_strategy arguments are as for
memory_object_change_. Wait for the kernel to report
completion if and only if wait is set.
*size bytes between the region other
points to and the region at offset in the pager indicated by
pager and memobj. If prot is VM_PROT_READ,
copying is from the pager to other; if prot contains
VM_PROT_WRITE, copying is from other into the pager.
*size is always filled in with the actual number of bytes
successfully copied. Returns an error code if the pager-backed memory
faults; if there is no fault, returns zero and *size will
be unchanged.
These functions allow you to recover the internal struct pager
state, in case the libpager interface doesn't provide an
operation you need:
struct user_pager_info associated with a pager.
Like several other Hurd libraries, libpager depends on you to
implement application-specific callback functions. You must
define the following functions:
*buf to be the address of the page, and set
*write_lock if the page must be provided read-only. The
only permissible error returns are EIO, EDQUOT, and
ENOSPC.
vm_deallocate (or equivalent)
buf. The only permissible error returns are EIO,
EDQUOT, and ENOSPC.
*offset and
*size the minimum valid address the pager will accept and
the size of the object.
The I/O interface facilities are described in <hurd/io.defs>.
This section discusses only RPC-based I/O operations.(6)
The I/O server must associate each I/O port with a particular set of uids and gids, identifying the user who is responsible for operations on the port. Every port to an I/O server should also support either the file protocol (see section File Interface) or the socket protocol (see section Socket Interface); naked I/O ports are not allowed.
In addition, the server associates with each port a default file pointer, a set of open mode bits, a pid (called the "owner"), and some underlying object which can absorb data (for write) or provide data (for read).
The uid and gid sets associated with a port may not be visibly shared
with other ports, nor may they ever change. The server must fix the
identification of a set of uids and gids with a particular port at the
moment of the port's creation. The other characteristics of an I/O port
may be shared with other users. The I/O server interface does not
generally specify the way in which servers may share these other
characteristics (with the exception of the deprecated
O_ASYNC interface); however, the file and socket interfaces make
further requirements about what sharing is required and what sharing is prohibited.
In general, users get send rights to I/O ports by some mechanism that is
external to the I/O protocol. (For example, fileservers give out I/O
ports in response to the dir_lookup and fsys_getroot
calls. Socket servers give out ports in response to the
socket_create and socket_accept calls.) However, the I/O
protocol provides methods of obtaining new ports that refer to the same
underlying object as another port. In response to all of these calls,
all underlying state (including, but not limited to, the default file
pointer, open mode bits, and underlying object) must be shared between
the old and new ports. In the following descriptions of these calls,
the term "identical" means this kind of sharing. All these calls must
return send rights to a newly-constructed Mach port.
The io_duplicate call simply returns another port which is
identical to an existing port and has the same uid and gid set.
The io_restrict_auth call returns another port, identical to the
provided port, but which has a smaller associated uid and gid set. The
uid and gid sets of the new port are the intersection of the set on the
existing port and the lists of uids and gids provided in the call.
Users use the io_reauthenticate call when they wish to have an
entirely new set of uids or gids associated with a port. In response to
the io_reauthenticate call, the server must create a new port,
and then make the call auth_server_authenticate to the auth
server. The rendezvous port for the auth_server_authenticate
call is the I/O port to which was made the io_reauthenticate
call. The server provides the rend_int parameter to the auth
server as a copy from the corresponding parameter in the
io_reauthenticate call. The I/O server also gives the auth
server a new port; this must be a newly created port identical to the
old port. The authserver will return the set of uids and gids
associated with the user, and guarantees that the new port will go
directly to the user that possessed the associated authentication port.
The server then identifies the new port given out with the specified
ID's.
Users write to I/O ports by calling the io_write RPC. They
specify an offset parameter; if the object supports writing at
arbitrary offsets, the server should honour this parameter. If -1
is passed as the offset, then the server should use the default file
pointer. The server should return the amount of data which was
successfully written. If the operation was interrupted after some but
not all of the data was written, then it is considered to have succeeded
and the server should return the amount written. If the port is not an
I/O port at all, the server should reply with the error
EOPNOTSUPP. If the port is an I/O port, but does not happen to
support writing, then the correct error is EBADF.
Users read from I/O ports by calling the io_read RPC. They
specify the amount of data they wish to read, and the offset. The offset
has the same meaning as for io_write above. The server should
return the data that was read. If the call is interrupted after some
data has been read (and the operation is not idempotent) then the server
should return the amount read, even if it was less than the amount requested.
The server should return as much data as possible, but never more than
requested by the user. If there is no data, but there might be later,
the call should block until data becomes available. The server indicates
end-of-file by returning zero bytes. If the call is
interrupted after some data has been read, but the call is idempotent,
then the server may return EINTR rather than actually filling the
buffer (taking care that any modifications of the default file pointer
have been reversed). Preferably, however, servers should return data.
There are two categories of objects: seekable and non-seekable.
Seekable objects must accept arbitrary offset parameters in the
io_read and io_write calls, and must implement the
io_seek call. Non-seekable objects must ignore the offset
parameters to io_read and io_write, and should return
ESPIPE to the io_seek call.
On seekable objects, io_seek changes the default file pointer for
reads and writes. (See section `File Positioning' in The GNU C Library Reference Manual,
for the interpretation of the whence and offset arguments.)
It returns the new offset as modified by io_seek.
The io_readable interface returns the amount of data which can be
immediately read. For the special technical meaning of "immediately",
see section Asynchronous I/O.
The server associates each port with a set of bits that affect its
operation. The io_set_all_openmodes call modifies these bits and
the io_get_openmodes call returns them. In addition, the
io_set_some_openmodes and io_clear_some_openmodes do an
atomic read/modify/write of the openmodes.
The O_APPEND bit, when set, changes the behaviour of
io_write when it uses the default file pointer on seekable
objects. When io_write is done on a port with the
O_APPEND bit set, is must set the file pointer to the current
file size before doing the write (which would then increment the file
pointer as usual). The current file size is the smallest offset
which returns end-of-file when provided to io_read. The server
must atomically bind this update to the actual data write with respect
to other users of io_read, io_write, and io_seek.
The O_FSYNC bit, when set, guarantees that io_write will
not return until data is fully written to the underlying medium.
The O_NONBLOCK bit, when set, prevents read and write from
blocking. They should copy such data as is immediately available. If
no data is immediately available they should return EWOULDBLOCK.
The definition of "immediately" is more or less server-dependent. Some servers, notably stored filesystem servers (see section Stored Filesystems), regard all data as immediately available. The one criterion is that something which must happen immediately may not wait for any user-synchronizable event.
The O_ASYNC bit is deprecated; its use is documented in the
following section. This bit must be shared between all users of the
same underlying object.
Users may wish to be notified when I/O can be done without blocking;
they use the io_async call to indicate this to the server. In
the io_async call the user provides a port on which will the
server should send sig_post messages as I/O becomes possible.
The server must return a port which will be the reference port in the
sig_post messages. Each io_async call should generate a
new reference port. (FIXME: xref the C library manual for information
on how to send sig_post messages.)
The server then sends one SIGIO signal to each registered async
user everytime I/O becomes possible. I/O is possible if at least one
byte can be read or written immediately. The definition of
"immediately" must be the same as for the implementation of the
O_NONBLOCK flag (see section Open Modes). In addition, every time a
user calls io_read or io_write on a non-seekable object, or at the
default file pointer on a seekable object, another signal should be sent
to each user if I/O is still possible.
Some objects may also define "urgent" conditions. Such servers should
send the SIGURG signal to each registered async user anytime an
urgent condition appears. After any RPC that has the possibility of
clearing the urgent condition, the server should again send the signal
to all registered users if the urgent condition is still present.
A more fine-grained mechanism for doing async I/O is the
io_select call. The user specifies the kind of access desired,
and a send-once right. If I/O of the kind the user desires is
immediately possible, then the server should return so indicating, and
destroy the send-once right. If I/O is not immediately possible, the
server should save the send-once right, and send a select_done
message as soon as I/O becomes immediately possible. Again, the
definition of "immediately" must be the same for io_select,
io_async, and O_NONBLOCK (see section Open Modes).
For compatibility with 4.2 and 4.3 BSD, the I/O interface provides a
deprecated feature (known as icky async I/O). The calls
io_mod_owner and io_get_owner set the "owner" of the
object, providing either a pid or a pgrp (if the value is negative).
This implies that only one process at a time can do icky I/O on a given
object. Whenever the I/O server is sending sig_post messages to
all the io_async users, if the O_ASYNC bit is set, the
server should also send a signal to the owning pid/pgrp. The ID port
for this call should be different from all the io_async ID ports
given to users. Users may find out what ID port the server uses for
this by calling io_get_icky_async_id.
Users may call io_stat to find out information about the I/O
object. Most of the fields of a struct stat are meaningful only
for files. All objects, however, must support the fields
st_fstype, st_fsid, st_ino, st_atime,
st_atime_usec, st_mtime_user, st_ctime,
st_ctime_usec, and st_blksize.
st_fstype, st_fsid, and st_ino must be unique for the underlying object across the entire system.
st_atime and st_atime_usec hold the seconds and
microseconds, respectively, of the system clock at the last time the
object was read with io_read.
st_mtime and st_mtime_usec hold the seconds and microseconds,
respectively, of the system clock at the last time the object was
written with io_write.
Other appropriate operations may update the atime and the mtime as well; both the file and socket interfaces specify such operations.
st_ctime and st_ctime_usec hold the seconds and microseconds, respectively, of the system clock at the last time permanent meta-data associated with the object was changed. The exact operations which cause such an update are server-dependent, but must include the creation of the object.
The server is permitted to delay the actual update of these times until stat is called; before the server stores the times on permanent media (if it ever does so) it should update them if necessary.
st_blksize gives the optimal I/O size in bytes for io_read
and io_write; users should endeavor to read and write amounts
which are multiples of the optimal size, and to use offsets which are
multiples of the optimal size.
In addition, objects which are seekable should set st_size to the
current file size as in the description of the O_APPEND flag
(see section Open Modes).
The st_uid and st_gid fields are unrelated to the "owner" as described above for icky async I/O.
Users may find out the version of the server they are talking to by
calling io_server_version; this should return strings and
integers describing the version number of the server, as well as its
name.
Servers may optionally implement the io_map call. The ports
returned by io_map must implement the external pager kernel
interface (see section Pager Library) and be suitable as arguments to
vm_map.
Seekable objects must allow access from zero up to (but not including)
the current file size as described for O_APPEND (see section Open Modes). Whether they provide access beyond such a point is
server-dependent; in addition, the meaning of accessing a non-seekable
object is server-dependent.
A file is traditionally thought of as a quantity of disk storage. In the Hurd, files are an extension of the I/O interface, but they do not necessarily correspond to disk storage.
Every file in the Hurd is represented by a port, which is connected to the server that manages the file. When a client wants to operate on a file, it makes RPC requests via a file port to its server process, which is commonly called a translator.
The Hurd filesystem allows you to set translators on any file or directory that you own. A translator is any Hurd server which provides the basic filesystem interface. Translated nodes are somewhat like a cross between Unix symbolic links and mount points.
Whenever a program tries to access the contents of a translated node, the filesystem server redirects the request to the appropriate translator (starting it if necessary). Then, the new translator services the client's request. The GNU C library makes this behaviour seamless from the client's perspective, so that standard Unix programs behave correctly under the Hurd.
Translators run with the privileges of the translated node's owner, so they cannot be used to compromise the security of the system. This also means that any user can write their own translators, and provide other users with arbitrary filesystem-structured data, regardless of the data's actual source. Other chapters in this manual describe existing translators, and how you can modify them or write your own.
The standard Hurd filesystem servers are constantly evolving to provide innovative features that users want. Here are a few examples of existing translators:
ext2fs, ufs, and
isofs (see section Stored Filesystems).
nfs and ftpfs
(see section Distributed Filesystems).
pflocal implements the
filesystem interfaces, but it also provides a special Unix-domain socket
RPC interface (FIXME xref). Programs can fetch a port to this
translator simply by calling file_name_lookup (FIXME xref) on
`/servers/socket/1'(7), then use Unix
socket-specific RPCs on that port, rather than adhering to the file
protocol.
This section focuses on the generic programs that you need to understand in order to use existing translators. Many other parts of this manual describe how you can write your own translators.
settrans
The settrans program allows you to set a translator on a file or
directory. By default, the passive translator is set (see the
`--passive' option).
The settrans program has the following synopsis:
settrans [option]... node [translator arg...]
where translator is the absolute filename of the new translator
program. Each arg is passed to translator when it starts.
If translator is not specified, then settrans clears the
existing translator rather than setting a new one.
settrans accepts the following options:
FIXME: finish
showtransmountfsysopts
Certain translators do not need to be very complex, because they
represent a single file rather than an entire directory hierarchy. The
trivfs library, which is declared in <hurd/trivfs.h>, does most of
the work of implementing this kind of translator. This library requires
the iohelp and ports libraries.
In order to use the trivfs library, you will need to define the appropriate callbacks (see section Trivfs Callbacks). As with all Hurd servers, your trivfs-based translator should first parse any command-line options, in case the user is just asking for help. Trivfs uses argp (see section `Argp' in The GNU C Library Reference Manual) for parsing command-line arguments.
Your translator should redefine the following functions and variables as
necessary, and then call argp_parse with the relevant arguments:
trivfs_set_options to handle runtime options parsing.
Redefining this is the normal way to add option parsing to a trivfs
program.
*argz of length
*argz_len a NUL-separated list of the arguments to this
translator.
After your translator parses its command-line arguments, it should fetch
its bootstrap port by using task_get_bootstrap_port. If this
port is MACH_PORT_NULL, then your program wasn't started as a
translator. Otherwise, you can use the bootstrap port to create a new
control structure (and advertise its port) with trivfs_startup:
trivfs_startup creates a new trivfs control port, advertises it
to the underlying node bootstrap with fsys_startup,
returning the results of this call, and places its control structure in
*control. trivfs_create_control does the same
thing, except it doesn't advertise the control port to the underlying
node. control_class and control_bucket are passed to
libports to create the control port, and protid_class and
protid_bucket are used when creating ports representing opens of
this node; any of these may be zero, in which case an appropriate port
class/bucket is created. If control is non-null, the trivfs
control port is returned in it. flags (a bitmask of the
appropriate O_* constants) specifies how to open the underlying
node.
If you did not supply zeros as the class and bucket arguments to
trivfs_startup, you will probably need to use the trivfs port
management functions (see section Trivfs Ports).
Once you have successfully called trivfs_startup, and have a
pointer to the control structure stored in, say, the fsys
variable, you are ready to call one of the
ports_manage_port_operations_* functions using
fsys->pi.bucket and trivfs_demuxer. This will
handle any incoming filesystem requests, invoking your callbacks when
necessary.
libports messages on trivfs ports.
The following functions are not usually necessary, but they allow you to
use the trivfs library even when it is not possible to turn
message-handling over to trivfs_demuxer and libports:
intran functions for a MiG port
type to have the stubs called with either the control or protid pointer.
Like several other Hurd libraries, libtrivfs requires that you
define a number of application-specific callback functions and
configuration variables. You must define the following variables
and functions:
struct stat. trivfs_fstype should be chosen
from the FSTYPE_* constants found in <hurd/hurd_types.h>.
O_READ,
O_WRITE, and O_EXEC; trivfs will only allow opens of the
specified modes.
struct stat (as returned from the underlying
node) for presentation to callers of io_stat. It is permissible
for this function to do nothing, but it must still be defined.
FSYS_GOAWAY_* found in
<hurd/hurd_types.h>.
The functions and variables described in this subsection already have
default definitions in libtrivfs, so you are not forced to define
them; rather, they may be redefined on a case-by-case basis.
trivfs_create_control (or trivfs_startup), which will
automatically be recognized.
O_NONBLOCK is set in flags. Any desired error can be
returned, which will be reflected to the user and will prevent the open from
succeeding.
trivfs_S_fsys_getroot
before any other processing takes place. If the return value is
EAGAIN, normal trivfs getroot processing continues, otherwise the
RPC returns with that return value.
If you choose to allocate your own trivfs port classes and buckets, the following functions may come in handy:
*bucket to the list of dynamically-
allocated port buckets; if *bucket is zero, an attempt is
made to allocate a new port bucket, which is then stored in
*bucket.
trivfs_add_port_bucket.
*class to the list of control or protid port
classes recognized by trivfs; if *class is zero, an attempt is
made to allocate a new port class, which is stored in *class.
trivfs_add_control_port_class or
trivfs_add_protid_port_class.
Even if you do not use the above allocation functions, you may still be able to use the default trivfs cleanroutines:
libports cleanroutines for
control port classes and protid port classes, respectively.
The fshelp library implements various things that are useful to most
implementors of the file protocol. It presumes that you are using the
iohelp library as well. libfshelp is divided into separate
facilities which may be used independently. These functions are
declared in <hurd/fshelp.h>.
These routines are self-contained and start passive translators, returning the control port. They do not require multithreading or the ports library.
*control. If the translator doesn't
respond or die in timeout milliseconds (if timeout is
greater than zero), return an appropriate error. If the translator dies
before responding, return EDIED.
fshelp_start_translator_long, except the initports and
ints are copied from our own state, fd[2] is copied from our own
stderr, and the other fds are cleared. For full-service filesystems, it
is almost always wrong to use fshelp_start_translator, because
the current working directory of the translator will not then be as
normally expected. (Current working directories of passive translators
should be the directory they were found in.) In fact, full-service
filesystems should usually start passive translators as a side-effect of
calling fshelp_fetch_root (see section Active Translator Linkage).
These routines implement the linkage to active translators needed by any filesystem which supports them. They require the threads library and use the passive translator routines above, but they don't require the ports library at all.
This interface is complex, because creating the ports and state
necessary for start_translator_long is expensive. The caller to
fshelp_fetch_root should not need to create them on every call,
since usually there will be an existing active translator.
fshelp_fetch_root to fetch more
information. Return the owner and group of the underlying translated
file in *uid and *gid; point
*argz at the entire passive translator specification for
the file (setting *argz_len to the length). If there is no
passive translator, then return ENOENT. cookie1 is the
cookie passed in fshelp_transbox_init. cookie2 is the
cookie passed in the call to fshelp_fetch_root.
fshelp_fetch_root to fetch more
information. Return an unauthenticated node for the file itself in
*underlying and *underlying_type (opened with
flags). cookie1 is the cookie passed in
fshelp_transbox_init. cookie2 is the cookie passed in the
call to fshelp_fetch_root.
dir_pathtrans (but O_CREAT and O_EXCL are not
meaningful and are ignored). The transbox lock (as set by
fshelp_transbox_init) must be held before the call, and will be
held upon return, but may be released during the operation of the call.
EBUSY instead.
fsys_getroot with the result; use fshelp_fetch_root
instead.
The flock call is in flux, as the current Hurd interface (as of
version 0.2) is not suitable for implementing the POSIX
record-locking semantics.
These functions are designed to aid with user permission checking. It is a good idea to use these routines rather than to roll your own, so that Hurd users see consistent handling of file and directory permission bits.
S_IREAD,
S_IWRITE, and S_IEXEC. If the access is permitted, return
zero; otherwise return an appropriate error code.
The following functions are completely standalone:
_servers to argv[0] .
ports[INIT_PORT_AUTH], and replaces it with the result.
All the other ports in ports and fds are then
reauthenticated, using any privileges available through auth. If
the auth port in ports[INIT_PORT_AUTH] is bogus, and
get_file_ids is non-null, it is called to get a list
of uids and gids from the file to use as a replacement. If secure
is non-null and any added ids are new, then the variable it points to is
set to nonzero, otherwise zero. If either the uid or gid case fails,
then the other may still apply.
*pt for the node numbered
fileno, suitable for returning from io_identity; exactly
one send right must be created from the returned value. fileno
should be the same value returned as the fileno out-parameter in
io_identity, and in the enclosing directory (except for mount
points), and in the st_ino stat field. bucket should be a
libports port bucket; fshelp requires the caller to make sure
port operations (for no-senders notifications) are used.
argp_parse in the standard way, with data from argz
and argz_len.
TOUCH_ATIME, TOUCH_MTIME, and TOUCH_CTIME
constants.
This section documents the interface for operating on files.
The file interface is a superset of the I/O interface (see section I/O Interface). Servers which provide the file interface are required to
support the I/O interface as well. All objects reachable in the
filesystem are expected to provide the file interface, even if they do
not contain data. (The trivfs library makes it easy to do so for
ordinary sorts of cases. See section Trivfs Library.)
The interface definitions for the file interface are found in
<hurd/fs.defs>.
Files have various pieces of status information which are returned by
io_stat (see section Information Queries). Most of this status
information can be directly changed by various calls in the file
interface; some of it should vary implicitly as the contents of the file
change.
Many of these calls have general rules associated with them describing
how security and privilege should operate. The diskfs library
(see section Diskfs Library) implements these rules for stored filesystems.
These rules have also been implemented in the fshelp library
(see section Fshelp Library). Trivfs-based servers generally have no need
to implement these rules at all.
In special cases, there may be a reason to implement a different security check from that specified here, or to implement a call to do something slightly different. But such cases must be carefully considered; make sure that you will not confuse innocent user programs through excessive cleverness.
If some operation cannot be implemented (for example, chauthor
over FTP), then the call should return EOPNOTSUPP. If it is
merely difficult to implement a call, it is much better to figure out a
way to implement it as a series of operations rather than to return
errors to the user.
There are several RPCs available for users to change much of the status
information associated with a file. (The information is returned by the
io_stat RPC; see section Information Queries.)
All these operations are restricted to root and the owner of the file.
When attempted by another user, they should return EPERM.
The file_chown RPC changes the owner and group of the file. Only
root should be able to change the owner, and changing the group to a
group the caller is not in should also be prohibited. Violating either
of these conditions should return EPERM.
The file_chauthor RPC changes the author of the file. It should
be legitimate to change the author to any value without restriction.
The file_chmod RPC changes the file permission mode bits.
The file_chflags RPC changes the flags of the file. It should be
legitimate to change the flags to any value without restriction. No
standard meanings have been assigned to the flags yet, but we intend to
do so. Do not assume that the flags format we choose will map
identically to that of some existing filesystem format.
The file_utimes RPC changes the atime and mtime of
the file. Making this call must cause the ctime to be updated as
well, even if no actual change to either the mtime or the
atime occurs.
The file_set_size RPC is special; not only does it change the
status word specifying the size of the file, but it also changes the
actual contents of the file. If the file size is being reduced it
should release secondary storage associated with the previous contents
of the file. If the file is being extended, the new region added to the
file must be zero-filled. Unlike the other RPCs in this section,
file_set_size should be permitted to any user who is allowed to
write the file.
Execution of programs on the Hurd is done through fileservers with the
file_exec RPC. The fileserver is expected to verify that the
user is allowed to execute the file, make whatever modifications to the
ports are necessary for setuid execution, and then invoke the standard
execserver found on `/servers/exec'.
This section specifically addresses what fileservers are expected to do, with minimal attention to the other parts of the process. See section Running Programs, for more general information.
The file must be opened for execution; if it is not, EBADF should
be returned. In addition, at least one of the execute bits must be on. A
failure of this check should result in EACCES---not
ENOEXEC. It is not proper for the fileserver ever to respond to
the file_exec RPC with ENOEXEC.
If either the setuid or setgid bits are set, the server needs to
construct a new authentication handle with the additional new ID's.
Then all the ports passed to file_exec need to be reauthenticated
with the new handle. If the fileserver is unable to make the new
authentication handle (for example, because it is not running as root)
it is not acceptable to return an error; in such a case the server
should simply silently fail to implement the setuid/setgid semantics.
If the setuid/setgid transformation adds a new uid or gid to the user's
authentication handle that was not previously present (as opposed to
merely reordering them), then the EXEC_SECURE and
EXEC_NEWTASK flags should both be added in the call to
exec_exec.
The server then needs to open a new port onto the executed file which
will not share any file pointers with the port the user passed in,
opened with O_READ. Finally, all the information (mutated
appropriately for setuid/setgid) should be sent to the execserver with
exec_exec. Whatever error code exec_exec returns should
returned to the caller of file_exec.
The flock call is in flux, as the current Hurd interface (as of
version 0.2) is not suitable for implementing the POSIX
record-locking semantics.
You should ignore the file_lock and file_lock_stat calls
until the new record-locking interface is implemented.
FIXME: Other active calls on files
file_sync
file_getfh
file_getlinknode
file_check_access
These manipulate meta-information:
file_reparent
file_statfs
file_syncfs
file_getcontrol
file_get_storage_info
file_get_fs_options
FIXME: Looking up files in directories
dir_lookup
dir_readdir
FIXME: Creating and deleting nodes
dir_mkfile
dir_mkdir
dir_rmdir
dir_unlink
dir_link
dir_rename
FIXME: File and directory change callbacks
File change notifications are not yet implemented, but directory notifications are.
file_notice_changes
dir_notice_changes
FIXME: How to set and get translators
file_set_translator
file_get_translator
file_get_translator_cntl
The filesystem interface (described in <hurd/fsys.defs>) is
supported by translator control ports.
FIXME: finish
In Unix, any file that does not act as a general-purpose unit of storage is called a special file. These are FIFOs, Unix-domain sockets, and device nodes. In the Hurd, there is no need for the "special file" distinction, since they are implemented by translators, just as regular files are.
Nevertheless, the Hurd maintains this distinction, in order to provide backward compatibility for Unix programs (which do not know about translators). Studying the implementation of Hurd special files is a good way to introduce the idea of translators to people who are familiar with Unix.
This chapter does not discuss `/dev/zero' or any of the microkernel-based devices, since these are translated by the generalized storeio server (FIXME xref).
FIXME: finish
FIXME: a chapter on libtreefs and libdirmgt will probably go here
A store is a fixed-size block of storage, which can be read and perhaps written to. A store is more general than a file: it refers to any type of storage such as devices, files, memory, tasks, etc. Stores can also be representations of other stores, which may be combined and filtered in various ways.
FIXME: finish
The store library (which is declared in <hurd/store.h>)
implements many different backends which support the store abstraction.
Hurd programs use libstore so that new storage types can be
implemented with minimum impact.
FIXME: describe startup sequence
"query".
classes is set of classes used to validate store types and
argument syntax.
struct store_argp_params.
The following functions provide basic management of stores:
store_set_flags, with the exception of
STORE_INACTIVE, which merely indicates that no attempt should be
made to activate an inactive store; if STORE_INACTIVE is not
specified, and the store returned for SOURCE is inactive, an attempt is
made to activate it (failure of which causes an error to be returned).
A reference to source is created (but may be destroyed with
store_close_source).
It is usually better to use a specific store open or create function
such as store_open (see section Store Classes), since they are
tailored to the needs of a specific store. Generally, you should only
use store_create if you are defining your own store class, or you
need options that are not provided by a more specific store creation
function.
store_create, remove the
reference to the source from which it was created.
struct store_run represents a contiguous region in a store's
address range. These are used to designate active portions of a store.
If start is -1, then the region is a hole (it is zero-filled
and doesn't correspond to any real addresses).
store_open_children. This is done heuristically, and so may not
succeed. If a child doesn't have a name, EINVAL is returned.
store_remap_create function, this may
simply modify source and return it.
The following functions allow you to read and modify the contents of a store:
store->block_size). Note that len is in bytes.
store->block_size).
The store library comes with a number of standard store class implementations:
libstore.
If you are building your own class vectors, the following function may be useful:
query store
store_set_flags. A reference to the open file is created (but
may be destroyed with store_close_source).
typed_open store
store_typed_open. If every child uses the same
`type:' prefix, then it may be factored out and put before
the child list instead (the two notations are differentiated by whether
or not the first character of name is alphanumeric).
device store
file store
store_create, this will always use file I/O, even it would
be possible to be more direct. This may work in more cases, for instance
if the file has holes. Consumes the file send right.
task store
zero store
copy store
vm_allocate, and will be consumed.
gunzip store
concat storeThis mode is designed to increase storage capacity, so that when one substore is filled, new data is transparently written to the next substore. Concatenation requires robust hardware, since a failure in any single substore will wipe out a large section of the data.
store_open_children.
ileave storeThis RAID mode is designed to increase storage performance, since I/O will probably occur in parallel if the substores reside on different physical devices. Interleaving works best with evenly-yoked substores... if the stores are different sizes, some space will be not be used at the end of the larger stores; if the stores are different speeds, then I/O will have to wait for the slowest store; if some stores are not as reliable as others, failures will wipe out every nth storage block, where n is the number of substores.
mvol store
This class is not included in store_std_classes, because it requires an application-specific callback.
remap store
store_remap function, this
function always operates by creating a new store of type `remap'
which has source as a child, and so may be less efficient than
store_remap for some types of stores.
The store library also provides some functions which help transfer stores between tasks via RPC:
<hurd/hurd_types.h> for an explanation of the encodings for the
various storage types.
file_get_storage_info RPC, and deallocate
enc.
file_get_storage_info RPC.
store_enc_init, or return an error. The contents of enc
may then be returned as the value of file_get_storage_info; if
for some reason this can't be done, store_enc_dealloc may be used
to deallocate the memory used by the unsent vectors.
store_enc_dealloc.
allocate_encoding method in each child store of
store, propagating any errors. If any child does not have such a
method, EOPNOTSUPP is returned.
EOPNOTSUPP
is returned.
store_std_leaf_decode.
Stored filesystems allow users to save and load persistent data from any random-access storage media, such as hard disks, floppy diskettes, and CD-ROMs. Stored filesystems are required for bootstrapping standalone workstations, as well.
FIXME: finish
FIXME: finish
FIXME: finish
FIXME: finish
The diskfs library is declared in <hurd/diskfs.h>, and does a lot
of the work of implementing stored filesystems. libdiskfs
requires the threads, ports, iohelp, fshelp, and store libraries. You
should understand all these libraries before you attempt to use diskfs,
and you should also be familiar with the pager library (see section Pager Library).
For historical reasons, the library for implementing stored filesystems
is called libdiskfs instead of libstorefs. Keep in mind,
however, that diskfs is useful for filesystems which are implemented on
any block-addressed storage device, since it uses the store library to
do I/O.
Note that stored filesystems can be tricky to implement, since the diskfs callback interfaces are not trivial. It really is best if you examine the source code of a similar existing filesystem server, and follow its example rather than trying to write your own from scratch.
This subsection gives an outline of the general steps involved in implementing a filesystem server, to help refresh your memory and to offer explanations rather than to serve as a tutorial.
The first thing a filesystem server should do is parse its command-line arguments (see section Diskfs Arguments). Then, the standard output and error streams should be redirected to the console, so that error messages are not lost if this is the bootstrap filesystem:
The following is a list of the relevant functions which would be called during the rest of the server initialization. Again, you should refer to the implementation of an already-working filesystem if you have any questions about how these functions should be used:
diskfs_root_node); at this point the pagers should be
ready to go.
fsys_startup on that port as appropriate and return
the realnode from that call; otherwise we call
diskfs_start_bootstrap and return MACH_PORT_NULL.
flags specifies how to open realnode (from the O_* set).
You should not need to call the following function directly, since
diskfs_startup_diskfs will do it for you, when appropriate:
diskfs_boot_flags is nonzero). All filesystem
initialization must be complete before you call this function.
The following functions implement standard diskfs command-line and runtime argument parsing, using argp (see section `Argp' in The GNU C Library Reference Manual):
EINVAL is returned if some option is
unrecognized. The default definition of this routine will parse them
using diskfs_runtime_argp.
*argz of length
*argz_len a NUL-separated list of the arguments to this
translator. The default definition of this routine simply calls
diskfs_append_std_options.
diskfs_get_options, argz and
argz_len must already have sane values.
diskfs_set_options to handle runtime option parsing. The
default definition is initialized to a pointer to
diskfs_std_runtime_argp.
diskfs_runtime_argp points to this, although the user can
redefine that to chain this onto his own argp.
argp_parse on this to parse the command line, chain
it onto the end of his own argp structure, or ignore it completely.
struct store_parsed structure should be passed as the
input argument to argp_parse; FIXME xref the declaration for
STORE_ARGP.
The following functions and variables control the overall behaviour of the library. Your callback functions may need to refer to these, but you should not need to modify or redefine them.
io_identity identity port for the filesystem.
struct timeval by the maptime_read
C library function (FIXME xref).
fsys_shutdown.
Every file or directory is a diskfs node. The following functions help your diskfs callbacks manage nodes and their references:
np->dn_stat; update ctime, atime, and mtime
if necessary. If wait is true, then return only after the
physical media has been completely updated.
*amtread is
filled with the amount actually read.
dir_notice_changes. The type of modification and
affected name are type and name respectively. This should
be called by diskfs_direnter, diskfs_dirremove,
diskfs_dirrewrite, and anything else that changes the directory,
after the change is fully completed.
These next node manipulation functions are not generally useful, but may come in handy if you need to redefine any diskfs functions.
IFDIR, also initialize `.' and `..' in the new
directory. Return the node in npp. cred identifies the
user responsible for the call. If name is nonzero, then link the
new node into dir with name name; ds is the result of
a prior diskfs_lookup for creation (and dir has been held
locked since). dir must always be provided as at least a hint for
disk allocation strategies.
np->dn_set_ctime is set, then modify
np->dn_stat.st_ctime appropriately; do the analogous
operations for atime and mtime as well.
*npp.
Like several other Hurd libraries, libdiskfs depends on you to
implement application-specific callback functions. You must
define the following functions and variables, but you should also look
at section Diskfs Options, as there are several defaults which should be
modified to provide good filesystem support:
diskfs_lookup and a call to one of diskfs_direnter,
diskfs_dirremove, or diskfs_dirrewrite. It must contain
enough information so that those calls work as described below.
struct dirstat.
dir_rename does not know
how to succeed if this is only one allowed link; on such formats you
need to reimplement dir_rename yourself.
dir_pathtrans. If this is exceeded, dir_pathtrans will
return ELOOP.
*statfsbuf with appropriate values to reflect the
current state of the filesystem.
diskfs_lookup, because it is simply a
wrapper for diskfs_lookup_hard, and is already defined in
libdiskfs.
Lookup in directory dp (which is locked) the name name.
type will either be LOOKUP, CREATE, RENAME,
or REMOVE. cred identifies the user making the call.
If the name is found, return zero, and (if np is nonzero) set
*np to point to the node for it, which should be locked.
If the name is not found, return ENOENT, and (if np is
nonzero) set *np to zero. If np is zero, then the
node found must not be locked, not even transitorily. Lookups for
REMOVE and RENAME (which must often check permissions on
the node being found) will always set np.
If ds is nonzero then the behaviour varies depending on the requested lookup type:
LOOKUP
*ds to be ignored by diskfs_drop_dirstat
CREATE
*ds to be ignored by
diskfs_drop_dirstat. *ds for a future call to
diskfs_direnter.
RENAME
*ds for a future call to
diskfs_dirrewrite. *ds for a future call to
diskfs_direnter.
REMOVE
*ds for a future call to
diskfs_dirremove. *ds to be ignored by
diskfs_drop_dirstat.
The caller of this function guarantees that if ds is nonzero, then
either the appropriate call listed above or diskfs_drop_dirstat
will be called with ds before the directory dp is unlocked,
and guarantees that no lookup calls will be made on this directory
between this lookup and the use (or destruction) of *DS.
If you use the library's versions of diskfs_rename_dir,
diskfs_clear_directory, and diskfs_init_dir, then lookups
for `..' might have the flag SPEC_DOTDOT ORed in. This has a
special meaning depending on the requested lookup type:
LOOKUP
CREATE
SPEC_DOTDOT is guaranteed not to be
given.
RENAME
REMOVE
*np) is
already held locked, so don't lock it or add a reference to it.
Return ENOENT if name isn't in the directory. Return
EAGAIN if name refers to the `..' of this filesystem's
root. Return EIO if appropriate.
diskfs_direnter, because it is simply a
wrapper for diskfs_direnter_hard, and is already defined in
libdiskfs.
Add np to directory dp under the name name. This will
only be called after an unsuccessful call to diskfs_lookup of type
CREATE or RENAME; dp has been locked continuously
since that call and ds is as that call set it, np is locked.
cred identifies the user responsible for the call (to be used only
to validate directory growth).
diskfs_dirrewrite, because it is simply a
wrapper for diskfs_dirrewrite_hard, and is already defined in
libdiskfs.
This will only be called after a successful call to diskfs_lookup
of type RENAME; this call should change the name found in
directory dp to point to node np instead of its previous
referent. dp has been locked continuously since the call to
diskfs_lookup and ds is as that call set it; np is
locked.
diskfs_dirrewrite has some additional specifications: name
is the name within dp which used to correspond to the previous
referent, oldnp; it is this reference which is being rewritten.
diskfs_dirrewrite also calls diskfs_notice_dirchange if
dp->dirmod_reqs is nonzero.
diskfs_dirremove, because it is simply a
wrapper for diskfs_dirremove_hard, and is already defined in
libdiskfs.
This will only be called after a successful call to diskfs_lookup
of type REMOVE; this call should remove the name found from the
directory ds. dp has been locked continuously since the
call to diskfs_lookup and ds is as that call set it.
diskfs_dirremove has some additional specifications: this routine
should call diskfs_notice_dirchange if
dp->dirmod_reqs is nonzero. The entry being removed has
name name and refers to np.
diskfs_lookup on
directory dp; this function is guaranteed to be called if
diskfs_direnter, diskfs_dirrewrite, and
diskfs_dirremove have not been called, and should free any state
retained by a struct dirstat. dp has been locked
continuously since the call to diskfs_lookup.
diskfs_drop_dirstat will ignore it.
*data with the entries;
which currently points to *datacnt bytes. If it isn't big
enough, vm_allocate into *data. Set
*datacnt with the total size used. Fill amt with the
number of entries copied. Regardless, never copy more than bufsiz
bytes. If bufsiz is zero, then there is no limit on
*datacnt; if n is -1, then there is no limit on
amt.
diskfs_clear_directory and
diskfs_init_directory, then `empty' means `only possesses entries
labelled `.' and `..'. cred identifies the user making
the call... if this user cannot search the directory, then this
routine should fail.
diskfs_node_translated is
true) look up the name of its translator. Store the name into newly
malloced storage and set *namelen to the total length.
diskfs_shortcut_symlink is set) then this
should clear the symlink, even if diskfs_create_symlink_hook
stores the link target elsewhere.
np->allocsize to the actual
allocated size. If the allocated size is already size bytes, do
nothing. cred identifies the user responsible for the call.
diskfs_node_reload is
subsequently called on all active nodes, so this call doesn't need to
reread any node-specific data.
*np to be the newly
allocated node.
diskfs_node_update (where np->dn_stat.st_mode was
zero). np's mode used to be mode.
diskfs_lost_hardrefs.
*np if it shouldn't be retained.
diskfs_node_refcnt_lock is held.
np->dn_stat and any associated
format-specific information to the disk. If wait is true, then
return only after the physical media has been completely updated.
diskfs_node_update for much
of the metadata. If wait is true, then return only after the
physical media has been completely updated.
MACH_PORT_NULL and set errno.
struct pager * that refers to the pager returned by
diskfs_get_filemap for locked node NP, suitable for use as an argument
to pager_memcpy.
prot parameter (the second
argument to diskfs_get_filemap) for all active user pagers.
The functions and variables described in this subsection already have
default definitions in libdiskfs, so you are not forced to define
them; rather, they may be redefined on a case-by-case basis.
You should set the values of any option variables as soon as your program starts (before you make any calls to diskfs, such as argument parsing).
diskfs_create_symlink_hook and diskfs_read_symlink_hook)
return EINVAL or are not defined. The library knows that the
dn_stat.st_size field is the length of the symlink, even if the
hook functions are used.
diskfs_set_sync_interval is called with this value when the first
diskfs thread is started up (in diskfs_spawn_first_thread). This
variable has a default default value of 30, which causes disk buffers to
be flushed at least every 30 seconds.
It must always be possible to clear the mode or the flags; diskfs will not ask for permission before doing so.
diskfs_shortcut_symlink
is set) it is called to set a symlink. If it returns EINVAL or
isn't set, then the normal method (writing the contents into the file
data) is used. If it returns any other error, it is returned to the
user.
diskfs_shortcut_symlink
is set) it is called to read the contents of a symlink. If it returns
EINVAL or isn't set, then the normal method (reading from the
file data) is used. If it returns any other error, it is returned to
the user.
diskfs_lookup on pdp.
The new directory must be clear within the meaning of
diskfs_dirempty. This routine assumes the usual convention where
`.' and `..' are represented by ordinary links; if that is not
true for your format, you have to redefine this function. cred
identifies the user making the call.
The library also exports the following functions, but they are not generally useful unless you are redefining other functions the library provides.
po->np
must be locked.
po->np must
be locked.
diskfs_start_protid;
the user to install is user.
root_parent,
shadow_root, and shadow_root_parent fields are copied from
context if it is nonzero, otherwise each of these values are
set to zero.
S_fsys_startup for execserver
bootstrap. The execserver is able to function without a real node,
hence this fraud. Arguments are as for fsys_startup in
<hurd/fsys.defs>.
libports messages on diskfs ports.
The diskfs library also provides functions to demultiplex the fs, io,
fsys, interrupt, and notify interfaces. All the server routines have
the prefix diskfs_S_. For those routines, in arguments of
type file_t or io_t appear as struct protid * to
the stub.
In the Hurd, translators are capable of redirecting filesystem requests to other translators, which makes it possible to implement alternative views of the same underlying data. The translators described in this chapter do not provide direct access to any data; rather, they are organizational tools to help you simplify an existing physical filesystem layout.
Be prudent with these translators: you may accidentally injure people who want their filesystems to be rigidly tree-structured.(10)
FIXME: finish
Distributed filesystems are designed to share files between separate machines via a network connection of some sort. Their design is significantly different than stored filesystems (see section Stored Filesystems): they need to deal with the problems of network delays and failures, and may require complex authentication and replication protocols involving multiple file servers.
FIXME: finish
FIXME: finish
FIXME: finish
FIXME: this subsystem is in flux
FIXME: net frobbing stuff may be added to socket.defs
FIXME: finish
FIXME: finish
FIXME: finish
FIXME: finish
FIXME: finish
Jump to: / - a - c - d - e - f - g - i - l - m - n - p - q - r - s - t - x - z
concat store
copy store
device drivers
device store
file store
gunzip store
ileave store
mvol store
query store
remap store
task store
typed_open store
zero store
This document was generated on 29 June 1999 using texi2html 1.56k.