Technical: Hardware: G4
Advanced Search
Apple Developer Connection
Member Login Log In | Not a Member? Contact ADC

Accelerate.framework errata

Below is a list of known issues with Accelerate.framework / vecLib that may cause your application to operate incorrectly:

vsub
LAPACK thread safety
Fortran calling vecLib's CDOTC, CDOTU, ZDOTC, and ZDOTU.
vImage Scale operations
vImage Shear operations

vsub

In MacOS X.2.7 (G5 only) and MacOS X.3.0 (G4 and G5 only), but not MacOS X.2.8 or earlier or later revisions of MacOS X, vsub and vsubd swap their two input arguments. Rather than performing c = b - a over the length of the array, they do c = a - b. G3 is unaffected on any OS. There are two ways to work around this problem:

workaround 1

Determine the version of OSX during the runtime and do the correct thing.

#include<stdint.h>
#include <CoreServices/CoreServices.h>
#include <vecLib/vDSP.h> //OR #include <Accelerate/Accelerate.h>

void Workaround_vsub( float *a, int aStride, float *b, int bStride, float *c, int cStride, int size )
{

static uint32_t version = 0;
float *temp;
int tempStride;
const uint32_t kSMEAGOL = 0x1027;
const uint32_t kPANTHER_0 = 0x1030;

//Do this only if the vector code is to be called
if
(
(1 == aStride) && (1 == bStride) &&
(1 == cStride)) && (8 <= size) &&
( ((int) a & 15) == ((int) b & 15) ) &&
( ((int) a & 15) == ((int) c & 15) )
)
{

//Only call Gestalt once
if( 0 == version )
Gestalt(gestaltSystemVersion, &version);

//Swap the arguments if necessary
if( (version == kSMEAGOL) || (version == kPANTHER_0) )
{ temp = a; a = b; b = temp; }

}

vsub(a, aStride, b, bStride, c, cStrude, size);

}

You can also use sysctl to determine the OS revision. This might be lighter weight for mach-o applications.

workaround 2

Another way is to use vsmul() to negate a and then use vadd():

#include <vecLib/vDSP.h> //OR #include <Accelerate/Accelerate.h>

void Workaround_vsub( float *a, int aStride, float *b, int bStride, float *c, int cStride, int size )
{

const float minusOne = -1.0f;

vsmul(a, aStride, &minusOne, c, cStride, size);

vadd( c, cStride, b, bStride, c, cStride, size );

}

This method is likely to be slower.

LAPACK thread safety

MacOS X applications that intend to call the LAPACK linear algebra APIs from multiple threads must take the following precautions to ensure correct results. LAPACK is part of the Accelerate and vecLib frameworks. Prototypes for its APIs can be found in:

/System/Library/Frameworks/vecLib.framework/Headers/clapack.h

In MacOS X Release 10.2, LAPACK is not thread-safe. Applications that intend to call the LAPACK APIs from multiple threads must implement their own locking discipline to prevent simultaneous execution of LAPACK routines.

In MacOS X Release 10.3, LAPACK thread-safety is greatly enhanced. Applications that intend to call the LAPACK APIs from multiple threads must ensure that the following two initialization calls are completed before commencing simultaneous execution of LAPACK routines.

In C:

extern double slamch_(char *), dlamch_(char *);

(void) slamch_("e");
(void) dlamch_("e");

In FORTRAN:

REAL A, SLAMCH
DOUBLE PRECISION D, DLAMCH
EXTERNAL SLAMCH, DLAMCH

A = SLAMCH('e')
D = DLAMCH('e')

Fortran calling vecLib's CDOTC, CDOTU, ZDOTC, and ZDOTU.

The FORTRAN entry points in Mac OS X's vecLib adhere to the call/return conventions of g77.

In particular, with g77, the return value of a COMPLEX or DOUBLE COMPLEX function is stored to memory through a pointer. The caller must take care to pass that pointer in PPC general purpose register R3 according to the g77 ABI.

With xlf (and the emerging g95), COMPLEX and DOUBLE COMPLEX function return values are left in the PowerPC floating point register file. Modern implementations of the C language use the same approach and no doubt gave impetus to this characteristic of modern FORTRAN.

Just four Level 1 BLAS functions are at issue: CDOTC, CDOTU, ZDOTC, and ZDOTU. Each returns a COMPLEX (or DOUBLE COMPLEX) value. When xlf compiles a function invocation into a call to one of these routines, it expects to find the *return* value in the floating point register file. When g77 compiles a function invocation into a call to one of these routines, it expects to find the return value in a pre-allocated *memory* location. The vecLib implementation of these four functions is compatible with the g77 scheme, but not the xlf scheme.

xlf codes may incorporate the following "wrappers" that re-implement CDOTC, CDOTU, ZDOTC, and ZDOTU in terms of a utility *subroutine* already present in vecLib. There is no ABI conflict in the call/return scheme for these vecLib subroutines with xlf. It is crucial though, that the same compiler, e.g. xlf, compile the caller to these replacements as well as the replacements themselves so that the *function* return ABI matches. The utility subroutines (cblas_*_sub) are fully optimized for PowerPC.

!
! scp% /opt/ibmcmp/xlf/8.1/bin/xlf95 -o xlfabi xlfabi.f -Wl,-framework -Wl,vecLib
! ** abitest === End of Compilation 1 ===
! ** zdotc === End of Compilation 2 ===
! ** zdotu === End of Compilation 3 ===
! ** cdotc === End of Compilation 4 ===
! ** cdotu === End of Compilation 5 ===
! 1501-510 Compilation successful for file xlfabi.f.
! scp% ./xlfabi
! (0.000000000000000000E+00,-2.00000000000000000)
! (2.00000000000000000,0.000000000000000000E+00)
! (0.0000000000E+00,-2.000000000)
! (2.000000000,0.0000000000E+00)

program abitest
double complex zx(1), zy(1), ztemp
double complex zdotc, zdotu
complex cx(1), cy(1), ctemp
complex cdotc, cdotu

zx(1)=(1.0, 1.0)
zy(1)=(1.0, -1.0)

ztemp = zdotc(1, zx, 1, zy, 1)
print *, ztemp

ztemp = zdotu(1, zx, 1, zy, 1)
print *, ztemp

cx(1)=(1.0, 1.0)
cy(1)=(1.0, -1.0)

ctemp = cdotc(1, cx, 1, cy, 1)
print *, ctemp

ctemp = cdotu(1, cx, 1, cy, 1)
print *, ctemp

stop
end

double complex function zdotc(n, zx, incx, zy, incy)
double complex zx(*), zy(*), z
integer n, incx, incy

call cblas_zdotc_sub(%val(n), zx, %val(incx), zy, %val(incy), z)

zdotc = z
return
end

double complex function zdotu(n, zx, incx, zy, incy)
double complex zx(*), zy(*), z
integer n, incx, incy

call cblas_zdotu_sub(%val(n), zx, %val(incx), zy, %val(incy), z)

zdotu = z
return
end

complex function cdotc(n, cx, incx, cy, incy)
complex cx(*), cy(*), c
integer n, incx, incy

call cblas_cdotc_sub(%val(n), cx, %val(incx), cy, %val(incy), c)

cdotc = c
return
end

complex function cdotu(n, cx, incx, cy, incy)
complex cx(*), cy(*), c
integer n, incx, incy

call cblas_cdotu_sub(%val(n), cx, %val(incx), cy, %val(incy), c)

cdotu = c
return
end

vImage Scale Operations

On MacOS X.3.{0,1,2}, the vImage Scale function may fail to properly translate the image vertically while it is scaling it. This can result in a resized image that is also translated. The last pixel row will be expanded to occupy a part of the image. It is recommended that you use the Affine Warp function instead, which does not have this problem. It may be slightly faster to use the low level shearing functions to do scaling, since that would be a two pass algorithm instead of a three pass algorithm.

vImage Shear Operations

The 1D shear operations do not support the case where the destination buffer size in the orthogonal dimension to the shear dimension (plus the srcOffset in that dimension, if non-zero) is larger than the size of the source buffer in that dimension. These functions attempt to fill all of the destination buffer. So, for example, when doing a horizontal shear, if the destination buffer height is larger than the source buffer height, a crash may occur since the destination buffer has more scanlines than the source buffer. Filling the entire destination buffer would naturally involve looking at scanlines in the source buffer that do not exist.

This limitation does not extend to size disparities in the shear dimension. In our horizontal shear example, if the width of the destination buffer is larger than the source buffer, the function handles the case gracefully, filling the residual space that does not map to any location in the source buffer with either the background color or the nearest edge pixel if kvImageEdgeExtend is used.

We do support oversized destination buffers in the orthogonal dimension through the AffineWarp functionality. The 1D shears are intended to be low level bottleneck functions, and have a few limitations that the higher level functions do not have.

Table of ContentsNextPrevious

Get information on Apple products.
Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Copyright © 2004 Apple Computer, Inc.
All rights reserved. | Terms of use | Privacy Notice