02 - Run Your First Code
This guide demonstrates how to offload a simple vector addition kernel to different backends like CUDA and Tenstorrent using docc. The example code is written in standard C and can be found on GitHub.
The following C code implements a simple vector addition (WAXPBY) operation.
#include <stdio.h>#include <stdlib.h>#include <omp.h>
#define N 8194
int main(int argc, char** argv) { float* x = (float*)malloc(N * sizeof(float)); float* y = (float*)malloc(N * sizeof(float)); float* w = (float*)malloc(N * sizeof(float));
// Initialize arrays float alpha = 2.0f; float beta = 3.0f; for (int i = 0; i < N; i++) { x[i] = (float)i; y[i] = (float)(N - i); w[i] = 0.0f; }
double start = omp_get_wtime();
// Perform waxpby operation: w = x + a * y for (int i=0; i<N; i++) { w[i] = alpha * x[i] + beta * y[i]; }
double end = omp_get_wtime();
// Print the result for (int i = 0; i < 32; i++) { printf("w[%d] = %f, ", i, w[i]); } printf("\n");
free(x); free(y); free(w);
return 0;}To compile this example with docc just use the following command, which is equivalent to standard clang or gcc:
docc -g -O3 example_01.c -o example_01.out./example_01.outdocc can automatically parallelize your code for multi-core CPUs. To enable this, simply use the OpenMP tuning mode:
docc -g -O3 -docc-tune=openmp example_01.c -o example_01.outCross-compiling for CUDA and Tenstorrent
Typically, running code on accelerators like NVIDIA GPUs or Tenstorrent devices requires rewriting kernels in CUDA or using specialized APIs. With docc, you can achieve this by changing a single compiler flag.
CUDA Backend:
docc -g -O3 -docc-tune=cuda example_01.c -o example_01.outTenstorrent Backend:
docc -g -O3 -docc-tune=tenstorrent example_01.c -o example_01.out