Code autogeneration

The xsuite library uses code autogeneration to specialize kernel code for the different contexts. Three contexts are presently available: CPU, CUDA, and OpenCL.

The developer writes a single C source code using the portability macros provided by xobjects/headers/common.h. The preferred macro API includes GPUFUN for functions callable on the GPU device, GPUKERN for kernels, GPUGLMEM for pointers to GPU global memory, RESTRICT for restrict qualifiers, and VECTORIZE_OVER / END_VECTORIZE for context-dependent loops. Older sources may still use the comment strings described below; these legacy annotations are kept for compatibility but should not be used in new handwritten C code. With macros, typos are caught by the compiler instead of being silently ignored as unknown comments.

`VECTORIZE_OVER` block

The preferred macro syntax is the following:

VECTORIZE_OVER(myvar, myvarlim);

    [MY CODE]

END_VECTORIZE;

This is translated into a for loop in the CPU implementation and into a single-particle block in the parallel implementations (cupy, pyopencl). Older sources may use the legacy //vectorize_over and //end_vectorize comments for the same purpose.

The corresponding CPU code will be:

for (int myvar=0; myvar<myvarlim; myvar++){ //autovectorized
    [MY CODE]
} //end autovectorized

The corresponding CUDA code will be:

int myvar; //autovectorized
myvar = blockDim.x * blockIdx.x + threadIdx.x; //autovectorized
if (myvar<myvarlim) { //autovectorized
    [MY CODE]
} //end autovectorized

The corresponding OpenCL code will be:

int myvar; //autovectorized
myvar = get_global_id(0); //autovectorized
    [MY CODE]
//end autovectorized

Context specific code guards

Context-specific code should be guarded with the XO_CONTEXT_* macros defined by Xobjects at compile time. The available context macros are:

XO_CONTEXT_CPU: Defined for both CPU contexts.
XO_CONTEXT_CPU_SERIAL: Defined for the serial CPU context.
XO_CONTEXT_CPU_OPENMP: Defined for the OpenMP CPU context.
XO_CONTEXT_CUDA: Defined for the CUDA GPU context.
XO_CONTEXT_CL: Defined for the OpenCL GPU context.

For example, CPU-only code can be written as:

#ifdef XO_CONTEXT_CPU
#include <math.h>
#endif

and OpenMP-specific code can be written as:

#ifdef XO_CONTEXT_CPU_OPENMP
#pragma omp parallel for
#endif

Older sources may still use the legacy //only_for_context directive. New handwritten C code should use the XO_CONTEXT_* macros instead so that the code is more readable and so that the typos are caught by the compiler.

`GPUFUN` directive

GPUFUN marks a C function that can be called from the kernel code. On CUDA it expands to __device__; on CPU it expands to static inline. Use it for helper functions and element tracking functions that need to work across CPU and GPU contexts.

Legacy C sources can use the /*gpufun*/ directive for the same purpose. New code should include xobjects/headers/common.h and use the GPUFUN macro instead.

`GPUKERN` directive

GPUKERN marks an entry-point kernel function launched by an Xobjects context. On CUDA it expands to __global__; on OpenCL it expands to __kernel; on CPU it is empty.

Legacy C sources can use the /*gpukern*/ directive for the same purpose. New code should include xobjects/headers/common.h and use the GPUKERN macro instead.

`GPUGLMEM` directive

GPUGLMEM marks a pointer as referring to global memory in GPU contexts. It expands to __global on OpenCL and is empty on CUDA and CPU.

Legacy C sources can use the /*gpuglmem*/ directive for the same purpose. New code should include xobjects/headers/common.h and use the GPUGLMEM macro instead.

Code autogeneration

VECTORIZE_OVER block

Context specific code guards

GPUFUN directive

GPUKERN directive

GPUGLMEM directive

`VECTORIZE_OVER` block

`GPUFUN` directive

`GPUKERN` directive

`GPUGLMEM` directive