[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: fftw3 non-pic k7 optimisations



On Wed, Mar 02, 2005 at 07:25:00PM +0000, Paul Brossier wrote:
> On Wed, Mar 02, 2005 at 05:16:50PM +0100, Florian Weimer wrote:
> > * Paul Brossier:
> > 
> > > Two questions:
> > >  - can anyone spot what in these codelets causes the non-pic ?
> > 
> > Tables of constants are addressed directly, not in some IP-relative
> > way.
> 
> thanks, i thought this could be sort of a problem. so iiuc:
> 
> KP707106781KP707106781: .float +0.707106781186547524400844362104849039284835938, +0.707106781186547524400844362104849039284835938
> 
> is ok, but
> 
>         pfmul KP707106781KP707106781, %mm3
> 
> is not pic compliant, and should be replaced with something like
> 
>         pfmul KP707106781KP707106781(%ebx), %mm3
> 
> given i didn't know anything about assembly yesterday, how far am i?

Hello Paul,

You need to refer to KP707106781KP707106781 through the GOT, something
like
pfmul KP707106781KP707106781@GOT(%ebx), %mm3

> > >  - how much can it hurt to have this non-pic in fftw3 ?
> > 
> > It shouldn't matter much if all PIC code is grouped together in the
> > binary because few pages have to be copied in this case.  PIC code
> > itself is always slower, significantly so if the code is using all
> > available integer registers.
> 
> I did some tests running 1024 points ffts on an AMD 700, and the
> difference between with and without --enable-k7 was roughly a
> drop of 1.5%. The drop could become more important on K7 with
> smaller number of points, where k7/3dnow optimisations come in.
> 
> If there is no objections, and as suggested by upstream, i will
> disable the k7 optimisations in order to make sure that
> libfftw3f.so remains PIC compliant, despite a little slower.

Why do you insist to have that code be position-independant ?

On x86 that achieve very little. It is common for library including
hand-written asm code to not be position-independant on x86, because
position-independant asm code is slower and uglier so the incentive
to write it is small. 

(Sorry to be pedantic, but 'PIC compliant' is meaningless: PIC is not
a standard but an attribute)

Cheers,
-- 
Bill. <ballombe@debian.org>

Imagine a large red swirl here. 



Reply to: