Re: multi CPU's

To: Andreas Bombe <andreas.bombe@munich.netsurf.de>
Cc: Tom Rothamel <tom-11053@onegeek.org>, Ben Collins <bcollins@debian.org>, debian-devel@lists.debian.org
Subject: Re: multi CPU's
From: Eray Ozkural <erayo@cs.bilkent.edu.tr>
Date: Sat, 08 Apr 2000 10:34:18 +0300
Message-id: <38EEE0F9.A4805C2F@cs.bilkent.edu.tr>
References: <3.0.5.32.20000405100900.0085f420@mailer.dc1.net> <20000405102033.E274@visi.net> <38EBB115.414C7DD5@cs.bilkent.edu.tr> <20000406035840.31725.qmail@onegeek.org> <38EC1256.588D5C50@cs.bilkent.edu.tr> <20000408003542.A1121@storm.local>

Andreas Bombe wrote:
> 
> Nope.  SMP is a pretty hefty change.  A lot of low level Linux code is
> inlined.  The spin locks differ a lot between UP and SMP, they are
> mostly no-ops on UP and the full asm sequence in SMP, same for the
> spin lock structures which have all the data in SMP and have zero
> length on UP (or one int length for gcc 2.7.2.3).
> 

Well, yes. Spin-locks are only for MT programming. It doesn't make
sense when you have 1 processor.

> That means a) SMP kernels have a lot of additional asm spread over the
> kernel and b) every code that accesses a structure that contains a
> spinlock structure could have to use different offsets to access a
> field on UP and SMP.  Because of b) you should also never try to force
> loading of SMP modules on UP and vice versa, it might get you memory
> corruption[1].
>

I guess so. When they first started on the SMP support, they just threw in
locks around every shared data structure. Kind of crude MT. They've
improved it though.

Both (a) and (b) may be problematic. Those are the kind of conditional
compilation that make a lot of change in the binary.

> > Though I'm pretty sure those preprocessor symbols are then scattered
> > all over the megs of source code. But that seems to be less than 1000.
> > [I've checked].
> 
> That's the Linux design philosophy.  Put the necessary #ifdefs into
> header files to define macros which to avoid #ifdefs in the actual C
> files.  That means that your count is multiplied by the count of each
>         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> macro use.
^^^^^^^^^^^^

Ha ha ha! Excuse me? Do you suggest that cpp makes multiple passes to
further expand preprocessor symbols once again. I hadn't heard this before :)

So let's see, I write

<<
#include <stdio.h>
#define ENTER_SUCKY #ifdef __SUCKY__
#define EXIT_SUCKY #endif

ENTER_SUCKY
printf("I suck completely\n");
EXIT_SUCKY
>>
in a file called suck.c

orion:tmp$ gcc suck.c 
suck.c:5: undefined or invalid # directive
suck.c:6: parse error before string constant
suck.c:6: warning: data definition has no type or storage class
suck.c:7: undefined or invalid # directive

Oops. Macros can't define preprocessor directives, sorry. That could be
a pragma though. Although the gcc info says some strange stuff about pragmas
they can be useful. OpenMP is a good example for proper #pragma's.
You know gcc first calls cpp and then the preprocessed code is fed into C
compiler, which then does... and so the story goes. It could be any # directive
known to the C compiler and not to the C preprocessor.

You'll need to issue a second macro processor to do that, there're zillions
of ways people are doing it. But AFAIK that's a very bad design philosophy.
Actually, I wouldn't want to stumble upon a piece of code that makes heavy
use of code expansion macros. Check VC++. Hackers disgust it because MS coders
tried to make new semantics with preprocessor macros (For message handlers
and stuff). Thinking about a piece of meta-C code that's first processed with
M4. Oh, nightmare. The proper use of preprocessor symbols should be limited
to only very well defined and consistent utilities, constants and portability
flags. Thank god I write C++ only: I rarely use preprocessor directives.
I use it only for plain conditional compilation.

That one's for your remarks on my counting abilities! :) Now, I'm a TM and
I can count pretty well. Upon your critique, I looked a bit close to the
kernel code, regarding SMP configuration. Now it seems that they've just
used the __SMP__ symbol for that. My estimate was the following:
Assume there are 768 SMP critical regions, if 256 bytes of code is generated
on the average for the conditional code, it makes 192KB. The second relevant
change in the binary is that offsets of code will change, and that effects
the function pointers. But there shouldn't be too many, since I don't think
Linus is very comfortable with jump vectors, ha ha. I'd read some of his
words on high level languages, god.. He can be really arrogant.

I didn't find a very easy way to report the size of all #ifdef __SMP ...
#endif's but those shouldn't be taking long. Ah, another thing. If the SMP
stuff changes the offsets of common data structures, that would make a big
difference in the code. But I guess it's not that pervasive. (You had mentioned
that as (b) regarding what CONFIG_SMP does to the binaries.)

> 
> [1] Modules must always be compiled with the same compiler and kernel
>     options as the kernel used.  Otherwise you really ask for
>     problems.  Don't install separately downloaded binary modules.
> 

Yep, yep, yep, I guess. It's so easy to make your MT code blow up.  That's
why I think there should be this standard kernel-image binary with SMP support
in it. For convenience.

OTOH, what you say if of course 100% correct. If there's a macro that handles
some SMP stuff, and that macro is used in every device driver and alike, then
there'd be an explosion which would surprise me. But I didn't see such a
magic preprocessor macro. The extensive inlining would be a problem, because
it could cause code explosion. [Which you mentioned as (a) ] Check this:

you know where this beast comes from :)

extern inline void down(struct semaphore * sem)
{
        __asm__ __volatile__(
                "# atomic down operation\n\t"
#ifdef __SMP__
                "lock ; "
#endif
                "decl (%0)\n\t"     /* --sem->count */
                "js 2f\n"
                "1:\n"
                ".section .text.lock,\"ax\"\n"
                "2:\tcall __down_failed\n\t"
                "jmp 1b\n"
                ".previous"
                :/* no outputs */
                :"c" (sem)
                :"memory");
}

that #ifdef there inside the inline causes the effect you're talking about.
But again, that change ain't much.

exa@borg:/usr/src/linux$ grep -e down\(.*\) -r . | wc -l
    626

In light of this, let me suggest the following. The 192KB estimate (which
is perhaps excessive wrt how it was thought) is inaccurate, so my new estimate
is 768KB. That will be compressed, so it makes ~384K. If that's achievable,
then still my point holds.

Sincerely,

-- 
 ++++-+++-+++-++-++-++--+---+----+----- ---  --  -  - 
 +  Eray "eXa" Ozkural                   .      .   .  . . .
 +  CS, Bilkent University, Ankara             ^  .  o   .      .
 |  mail: erayo@cs.bilkent.edu.tr                .  ^  .   .

Reply to:

Follow-Ups:
- Re: multi CPU's
  - From: Andreas Bombe <andreas.bombe@munich.netsurf.de>

References:
- multi CPU's
  - From: Tim <moment_1@dc1.net>
- Re: multi CPU's
  - From: Ben Collins <bcollins@debian.org>
- Re: multi CPU's
  - From: Eray Ozkural <erayo@cs.bilkent.edu.tr>
- Re: multi CPU's
  - From: "Tom Rothamel" <tom-11053@onegeek.org>
- Re: multi CPU's
  - From: Eray Ozkural <erayo@cs.bilkent.edu.tr>
- Re: multi CPU's
  - From: Andreas Bombe <andreas.bombe@munich.netsurf.de>

Prev by Date: Re: Bug#61792: marked as done (telnet: Can't type non-ascii chars.)
Next by Date: Re: Maintainers needed for freeciv, dcd and lletters
Previous by thread: Re: multi CPU's
Next by thread: Re: multi CPU's
Index(es):
- Date
- Thread