[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: clusters, infrastructures, and package tools



[I'm not on the list, would you please cc to <bud@sistema.it>?]

At 10:16 PM 08-02-00 +0000, Eray Ozkural wrote:
>>  "Bud P. Bruegger" wrote: <bud@sistema.it>] I'm interested in the
>>administration of clusters and infrastructures (
>>........

> think most of the required software is already available, you just need to
>tailor 
>them to debian. Though *packages* that do it would be nice.   

There are solutions out there such as Depot and Sup (see "Bootstrapping an
Infrastructure" by Steve Traugott and Joel Huddleston, found at
www.infrastructures.org), or there is SEPP by Tobi Oetiker, and there are
other approaches (most of them are listed in
http://www.sistema.it/twiki/bin/view/Main/infrastructures).  But they
usually have their own "package format" and to install something on an
infrastructure is the effort of creating a source package plus compliling...  

As ideal solution that we should attempt in the medium run, a package
approach would be much easier: I envision a single source package from
which one can automatically produce binary packages for multple processor
architectures and base OS (Debian, Solaris, Aix, Irix, etc.).  A
specialized (or modified Debian) package tool would install that in the
right place (Infrastructure management is often based on a global
filesystem where every architecture has it's subtree).  The configuration
changes that are part of installation (in Debian done by install scripts)
have to be made compatible with a centralized configuration management
approach.  

While I believe that some first workable solution can be found that
requires only smally developments and modifications and mostly pulls
together existing tools, it seems that there is no readily worked out
solution for the problem.  My impression from the discussion on
infrastructures.org is that may people have handrolled solutions, that
there is a lack of collaboration and discussion, and I'm not aware of any
package based solution...  Since I'm using exclusively Debian in my company
(and I like it), extending the Debian approach to work in a more
general/dishomogeneous context seems to be a good approach.

> Err, a heterogeneous cluster is not that different from a lab network or a
>company's 
>intranet. So we vould view it as a cluster software setup/maintenance
problem 

Agreed, there are many mini-infrastructures that could greatly benefit from
an infrastructures approach.  My company has some kind of a
mini-infrastructure problem (relatively few machines and all Debian).  But
solving only the restricted problem would cut out a very large number of
professional sys ops who have to deal with multiple platforms.  That would
be a petty.  Also, I believe a possible Debian cluster project should take
the extensive experience in the infrastructures management field into
account (I'm therefore collecting Links on my site...).

 
>   Okay, but let's not try to mod dpkg, it's already pretty loaded :)
There were a few 
>tools which did part of that. Though a common configuration environment
>suitable 
>for master/slave roles would be all right, and which does away with
>problems that 
>stem for different architectures (by being conservative, and managing
>arch-specific 
>stuff)   

Among the interesting sounding existing tools there are:
* Depot, SEPP, pgklink, GNU stow and similar tools
* CVSup, SUP, rsync  or alternatively (persistently) caching filesystems
such as Coda or Inter-Mezzo (my favorites), or not as cool also AFS, DFS
and if you really wanna suffer NFS.
* Automatic installation tools such as FAI (Debian-based), of CluClo (comes
from the Beowulf world), and some others that have not been published but
would be available (Jon Stearley of UNM has something that he ran on Debian).
* GNU cfengine 
and maybe I forgot some important things (see
http://www.sistema.it/twiki/bin/view/Main/infrastructures for the URLs of
all these things).

What is missing is:

* to figure out how package tools fit in and how cluster
installation/upgrade can be made easy and quick (for those who what
this--some insist on packaging and compiling by hand).  

* how to best centrally manage configuration and make the package tools
interact with centralized config management.

> Ooops, the larger a beowulf, the more likely that you'll have different
>archs. 
>Even "expanding" a beowulf, say by "merging" two homogenous beowulf 
>clusters is problematic. (So, it might be cheesy to add support for cluster
>of clusters) 

The infrastructure has more heterogeneity since it mixes in other OS
(Solaris, AIX) on top of just multiple CPU architectures--but all under
Linux and even Debian...

Merging:  The infrastructures approach as I understand it stores all state
of the cluster in a central place (usually called "gold server").  From
there, configuration (installed binaries, config files, etc.) automatically
propagate to all "client machines":  you add a new machine or replace one
and a tool such as FAI boots over the network, partitions the disk, and
installes everything necessary.  Config and changes may come down from the
gold server via cfengine.  In this context, you can add machines to the
cluster and they should get their changes automatically.  There is a
minimum needed to participate in a cluster:
* boot floppy or boot rom on a virgin machine or
* cfengine or similar on a machine that is already installed to receive the
config changes.

  
> And probably Progeny Linux will be crying out 
>*loud* for some of the stuff you ask. :) [Or they might find flawless
>automation 
> ] 

I searched for Progeny on the distribution page of lwn.net, didn't find
anything.  Do you have an URL?

>   Err, so you want debian clusters to contain non-debian nodes? Why? :)
But I think 
>it could be supported. For instance, you could make a server for a lab with
>both linux 
>and solaris machines. 

Most larger installations (infrastructures) have a wild mixture of machines
for historical and/or political reasons or because some applications simply
don't run on Linux.  And there is no choice--even if the managers of these
infrastructures would love to have only Debian...  I believe the goal for
Debian should be to become the most infrastructure-friendly Linux
distribution and possibly to extend the scope of it's approach into the
non-linux domain to bring ease and homogeneity of cluster administration.  
 
> If deb had good support for 
>such systems, it could gain some popularity in the eyes of managers,
>considering 
>the cost of such services supplied by proprietary software. Business
>boffins are 
>gonna love it! ;) 

I would and some boffin friends of mine would too :-).  Apart that this is
not just a matter of cost--the proprietory solutions that may be out there
may work fine in a homogeneous single vendor environment--but AFAIK there
are few commercial solution for heterogeneous infrastructures out there.
This is a turf that is ideal for open source.  Most open source software
out there is multi-platform (Unix-like systems), so why artificially
restrict Debian tools to only Linux?  As a matter of fact people already
port them and the first moves to embrace more of the computing world are
happening...

--bud

/------------------------------------------------------------------------\
| Bud P. Bruegger, Ph.D.  |  mailto:bud@sistema.it                       |
| Sistema                 |  http://www.sistema.it                       |
| Information Systems     |  voice general: +39-0564-418667              |
| Via U. Bassi, 54        |  voice direct:  +39-0564-418667 (internal 41)|
| 58100 Grosseto          |  fax:           +39-0564-426104              |
| Italy                   |  P.Iva:         01116600535                  |
\------------------------------------------------------------------------/


Reply to: