Skip to content

Search is only available in production builds. Try building and previewing the site to test it out locally.

02 - Run Your First Code

This guide demonstrates how to offload a simple vector addition kernel to different backends like CUDA and Tenstorrent using docc. The example code is written in standard C and can be found on GitHub.

The following C code implements a simple vector addition (WAXPBY) operation.

#include <stdio.h>
#include <stdlib.h>
#include <omp.h>
#define N 8194
int main(int argc, char** argv) {
float* x = (float*)malloc(N * sizeof(float));
float* y = (float*)malloc(N * sizeof(float));
float* w = (float*)malloc(N * sizeof(float));
// Initialize arrays
float alpha = 2.0f;
float beta = 3.0f;
for (int i = 0; i < N; i++) {
x[i] = (float)i;
y[i] = (float)(N - i);
w[i] = 0.0f;
}
double start = omp_get_wtime();
// Perform waxpby operation: w = x + a * y
for (int i=0; i<N; i++) {
w[i] = alpha * x[i] + beta * y[i];
}
double end = omp_get_wtime();
// Print the result
for (int i = 0; i < 32; i++) {
printf("w[%d] = %f, ", i, w[i]);
}
printf("\n");
free(x);
free(y);
free(w);
return 0;
}

To compile this example with docc just use the following command, which is equivalent to standard clang or gcc:

Terminal window
docc -g -O3 example_01.c -o example_01.out
./example_01.out

docc can automatically parallelize your code for multi-core CPUs. To enable this, simply use the OpenMP tuning mode:

Terminal window
docc -g -O3 -docc-tune=openmp example_01.c -o example_01.out

Cross-compiling for CUDA and Tenstorrent

Typically, running code on accelerators like NVIDIA GPUs or Tenstorrent devices requires rewriting kernels in CUDA or using specialized APIs. With docc, you can achieve this by changing a single compiler flag.

CUDA Backend:

Terminal window
docc -g -O3 -docc-tune=cuda example_01.c -o example_01.out

Tenstorrent Backend:

Terminal window
docc -g -O3 -docc-tune=tenstorrent example_01.c -o example_01.out