[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: OT: Language War (Re: "C" Manual)



On Sun, Jan 06, 2002 at 04:51:14PM -0800, Erik Steffl wrote:
| dman wrote:
| > 
| > On Sat, Jan 05, 2002 at 09:38:01PM -0800, Erik Steffl wrote:
| > | dman wrote:
| > ...
| > | > In C/C++ there is an invariant on strings ("char*", which is
| > | > essentially equivalent to "char[]") that they end with a NUL byte.
| > |
| > |   no, that's not true.
| > 
| > It is true.  A type is more than the name a compiler gives it.  A type
| 
| the type in C is only what type in C is. what the type is in your head
| is irrelevant.

No, a type is a certain thing, irrespective of the language you are
currently working with.  A type is a higher-level thing than a
compiler is.  Each languge has its own way of (trying) to express
types, but by necessity each is limited and incomplete.  Will you try
and convince me that python has no types?  You never tell the compiler
the type of object an identifier will refer to, but you still get
TypeError exceptions.

[...] 
| > | > A better approach is to use classes with set interfaces to ensure that
| > | > any invariants on the type can't be broken, but C doesn't have such
| > | > capabilities.  It is also better if the system can perform those
| > | > checks for you, but it isn't always possible or feasible.
| > | >
| > | > Eiffel provides a way for you to specify the invariants,
| > | > preconditions, and postconditions of portions of code (classes and
| > | > methods); but even so, not everything can be checked.
| > |
| > |   that's all good, but the point here is that it's not type issue. the
| > | type of variable and the domain of valid values are two separate issues
| > | (well, you could define a new type where type would be the same as valid
| > | domain but it's not always possible).
| > 
| > It is a type issue.  As stated above a type is more than a name; it
| > also includes all the invariants that must be kept.
| 
|   type is what given language considers a type. not what you (or I)
| consider a type. and also, you need to differentiate between types and
| domains of valid values (not sure if these are correct english math
| terms but think of your math classes (mathematical analysis) and I'm
| quite sure you'll know what that menas - most of math fairy tales starts
| with: for all x, y from N, where x<y (N is type, condition after where
| specifies domain of valid values))
| 
| > | > | > Therefore, I claim that type safety is a more fundamental concept than
| > | > | > resource mangement.
| > | > |
| > | > |   well, why?
| > | >
| > | > See the above discussion of invariants, preconditions, and
| > | > postconditions.  Those are part of a "type", but have little to do
| > | > with resource management.
| > |
| > |   above discussion is irrelevant to the issue at hand, there is no such
| > | type in C and the preconditions are only for specific functions (and are
| > | more part of specifying valid domain that valid types).
| > |
| > |   if you don't believe it, just check it again - there is no type error
| > | in the C code quoted above. it's the value that's wrong for given
| > | function (operator <<).
| > 
| > I agree that the C compiler has an incomplete notion of 'types' (based
| > solely on name) which is why *it* doesn't tell you that you made an
| > error.
| 
|   yes, but:
| 
|   solely on name: not sure what you mean here, you refer to types by
| names in all languages,

In python you only refer to a type by referencing a factory for that
type (class objects are factories for their own instances).  Eg the
function you have below would be written as :

def operator_star( i , j ) :
    if i < j : segfault()
    else : return float( i*j )

note that there is no mention of the types of i, j, or the return
value.  The closest it comes is calling a function named 'float' that
converts it's argument to a floating point object.

| it's just that definition of type might be more
| complicated, in C you have certain primitive types that you can use to
| construct new type given specific operations (typedef, struct, ...), in
| some languages you might have a way to specify type using 'procedural'
| conditions.

'typedef' in C is the same as #define, except for the binding of, eg,
'*' to each one.  It just creates a new name for the same old thing.

Ex :
    #define string char*
    string s ;
    string t ,  u  ;

    typedef char* string ;
    string s ;
    string t , u ;

the only difference is the incorrect application of '*' in the 3rd
line of the first example.

'struct' creates a name that refers to a combination of other names,
but still that is not the complete type.

|   in the example above (sort of, it's not quoted anymore) there is only
| one variable involved and so you can try to make it an issue of type,
| why is it wrong and why you have to consider types and domains of valid
| values separately is clear when using more than one variable.
| 
|   imagine that you have a binary operator that only works for certain
| values of parameters:

Ok :

| int operator * (int i, int j)
| {
|   if(i<j) segfault()
|   else    return (double)i * (double)j;
| }

The complete type of 'j' is _not_ int.  The complete type is an
integer that is less than i.

|   is there any way to specify types of i and j so that it never
| segfaults? no.

I agree that C has no way of expressing that, which is why you need to
manually insert runtime checks on the values.  You are asserting an
invariant on the real type of 'j' since the C compiler can't do it for
you.

VMD-SL is a modelling language heavily based in discrete mathematics
and logic.  In VDM-SL you create a type by beginning with a name and
equating it to a built-in type (eg 'int', 'real', or 'token'[1]).
Then you list all of the invariants that your new type has.  It is a
formal language, but it is not programming, thus it is possible to
completely specify the types.

| in general you have to consider the values separately from type. and
| IMO it makes perfect sense in case of 'string' in C - it's just a
| special value (special for certain functions, NOT for C language) of
| character array.

When writing code in a statically typed language you do write out the
valid values (invariants) separately from the "types" because of the
limitations in the language.  When you write code in, eg, python, you
only consider the valid values since those are what make up a type.

I've just come up with a good description of what a 'type' is :
    A "type" is the set of all valid values.
We like to give names to these sets so that we can talk about them in
a sane manner.  Have you read "The Hobbit"?  Do you remember what
Treebeard told Bilbo about his name?  In his native tongue,
Treebeard's name is his life story; that is how he is identified.
Since Bilbo (and us as the reader of the story) isn't up for that sort
of detail the abbreviated name "Treebeard" can be used to identify
him.  This is what types are like -- in reality they are the set of
all valid values, but in practice we use a condensed set of common
names to approximate it.

When you say that, in C, something is an 'int', is it possible to have
a bit pattern there that is not a valid 'int'?  No.  'int' describes
the set of all valid values and every possible bit pattern you can
stick there is a valid int.  In many cases when you write 'int' in
your code (for example the java.util.Vector.getElement() method) you
don't really want all the valid values that 'int' describes, but
merely a subset of them (0<=i<length).  The problem is that today's
programming languages don't provide a mechanism to express this so
programmers approximate it with types that describe supersets of the
set they want.  (this explains why I dislike java and its type system
so much; for C it is acceptable because C excels at the low-level
problems it was intended to solve)

One reason that we use names like "string" when discussing C
programs/libraries/functions is because as humans we fully understand
what the "string" type entails -- a certain subset of all the possible
character arrays.  When we write that in C we write "char*" (or
char[]) because the C compiler doesn't understand any more than that.
Have you read the book "Design Patterns" by the Gang of Four?  That
book carefully describes many patterns, all named, so that as
developers we can simply mention the name in a discussion and each
will understand what is intended.  This is better than re-inventing
each pattern every time we use it because we limit our expressiveness
to that of C (or C++ or Java or Assembly, or $LANGUAGE).  You are
correct to say that The C/C++/Java/whatever compiler doesn't
understand the type "Singleton", but as developers we know what it is.
There are ways to approximate it in each language, but the compiler
does not (cannot) check the correctness of your program.  It can only
approximate it by asserting that the bit patterns shoved around are of
the size (and to a limited extent, the order) the that you said
you wanted.


I feel I should say that I can see you understand C (and perhaps other
languages) quite well and you do realize that "char*" is an incomplete
description of what a string is.  I believe that you understand what I
am saying, but the word "type" is not defined well enough nor
universally enough, which is what fuels this discussion.

-D

[1]  'token' in VDM-SL is a way of saying "this entity is a thing and
     I can determine whether it is equal to another of these things,
     but that is all I know",  it is analogous to 'java.lang.Object'
     in java or 'Any' in eiffel.

-- 

But As for me and my household, we will serve the Lord.
        Joshua 24:15



Reply to: