02 - Run Your First Code

This guide demonstrates how to offload a simple vector addition kernel to different backends like CUDA and Tenstorrent using docc. The example code is written in standard C and can be found on GitHub.

The following C code implements a simple vector addition (WAXPBY) operation.

#include <stdio.h>
#include <stdlib.h>
#include <omp.h>

#define N 8194

int main(int argc, char** argv) {
    float* x = (float*)malloc(N * sizeof(float));
    float* y = (float*)malloc(N * sizeof(float));
    float* w = (float*)malloc(N * sizeof(float));

    // Initialize arrays
    float alpha = 2.0f;
    float beta = 3.0f;
    for (int i = 0; i < N; i++) {
        x[i] = (float)i;
        y[i] = (float)(N - i);
        w[i] = 0.0f;
    }

    double start = omp_get_wtime();

    // Perform waxpby operation: w = x + a * y
    for (int i=0; i<N; i++) {
        w[i] = alpha * x[i] + beta * y[i];
    }

    double end = omp_get_wtime();

    // Print the result
    for (int i = 0; i < 32; i++) {
        printf("w[%d] = %f, ", i, w[i]);
    }
    printf("\n");

    free(x);
    free(y);
    free(w);

    return 0;
}

To compile this example with docc just use the following command, which is equivalent to standard clang or gcc:

docc -g -O3 example_01.c -o example_01.out
./example_01.out

docc can automatically parallelize your code for multi-core CPUs. To enable this, simply use the OpenMP tuning mode:

docc -g -O3 -docc-tune=openmp example_01.c -o example_01.out

Cross-compiling for CUDA and Tenstorrent

Typically, running code on accelerators like NVIDIA GPUs or Tenstorrent devices requires rewriting kernels in CUDA or using specialized APIs. With docc, you can achieve this by changing a single compiler flag.

CUDA Backend:

docc -g -O3 -docc-tune=cuda example_01.c -o example_01.out

Tenstorrent Backend:

docc -g -O3 -docc-tune=tenstorrent example_01.c -o example_01.out