02 - Compilation from Source & Usage
Python Bindings and Frontend
This component provides Python bindings via pybind11 to build SDFGs with Python.
Additionally, this component provides the @native decorator, which automatically
converts Python functions into SDFGs and codegens them for the available target.
- Bindings: Build SDFGs programmatically with Python
- AST Parser: Automatically compile Python/NumPy code to optimized native code
- Targets: Support for CPU (sequential, OpenMP, SIMD via Highway), CUDA GPUs, and other accelerators
Build from Sources
To build the Python component from sources run pip on the component's directory:
pip install -e python/Requirements
Python Dependencies
- Python >= 3.11
- NumPy >= 1.19.0
- SciPy >= 1.7.0
System Dependencies
The following system dependencies must be installed (same as core components):
sudo apt-get install -y libgmp-dev libzstd-devsudo apt-get install -y nlohmann-json3-devsudo apt-get install -y libboost-graph-dev libboost-graph1.74.0sudo apt-get install -y libisl-devCompiler Requirements
Clang/LLVM 19 is required for code generation. Install it with:
# Ubuntu/Debiansudo apt-get install -y clang-19 llvm-19For CUDA support, you also need the NVIDIA CUDA Toolkit installed.
Usage
The @native Decorator
The @native decorator is the primary way to use the Python frontend. It automatically:
- Parses the Python function's AST
- Converts it to an SDFG representation
- Applies optimizations based on the target
- Compiles to native code
- Executes and returns results
Basic Example
from docc.python import nativeimport numpy as np
@nativedef vector_add(A, B, C, N): for i in range(N): C[i] = A[i] + B[i]
# UsageN = 1024A = np.random.rand(N).astype(np.float64)B = np.random.rand(N).astype(np.float64)C = np.zeros(N, dtype=np.float64)
vector_add(A, B, C, N) # JIT compiles and executesCompilation Targets
The @native decorator accepts a target parameter to specify the code generation backend:
target="none" (Default)
No scheduling or optimization is applied. The SDFG is compiled as-is without parallelization. Useful for debugging or when you want to manually control the generated code.
@native(target="none")def simple_loop(A, B, N): for i in range(N): B[i] = A[i] * 2.0target="sequential"
Generates optimized sequential code with SIMD vectorization using Google Highway. This target automatically vectorizes eligible loops using portable SIMD intrinsics.
import math
@native(target="sequential")def vectorized_sin(A, B): for i in range(A.shape[0]): B[i] = math.sin(A[i])
# Highway automatically vectorizes the sin computationN = 128A = np.random.rand(N).astype(np.float64)B = np.zeros(N, dtype=np.float64)vectorized_sin(A, B)Supported Highway operations include:
- Trigonometric:
sin,cos,asin,acos,atan,atan2 - Exponential:
exp,log,log10,log2,pow - Hyperbolic:
sinh,cosh,tanh,asinh,acosh,atanh - Other:
sqrt,abs,floor,ceil,round
target="openmp"
Generates parallel code using OpenMP. Suitable for multi-core CPUs. Loops are automatically parallelized with appropriate scheduling.
@native(target="openmp", category="desktop")def parallel_add(A, B, C, N): for i in range(N): C[i] = A[i] + B[i]
# Executes in parallel across CPU coresN = 1000000A = np.random.rand(N).astype(np.float64)B = np.random.rand(N).astype(np.float64)C = np.zeros(N, dtype=np.float64)parallel_add(A, B, C, N)target="cuda"
Generates CUDA code for NVIDIA GPUs. Loops are mapped to GPU thread blocks and threads.
@native(target="cuda", category="server")def gpu_add(A, B, C, N): for i in range(N): C[i] = A[i] + B[i]
# Executes on GPU (data is automatically transferred)N = 1024A = np.random.rand(N).astype(np.float64)B = np.random.rand(N).astype(np.float64)C = np.zeros(N, dtype=np.float64)gpu_add(A, B, C, N)The category Parameter
The category parameter provides hints to the scheduler about the target hardware:
"edge""desktop""server"
@native(target="openmp", category="desktop")def cpu_kernel(A, B): ...
@native(target="cuda", category="server")def gpu_kernel(A, B): ...Advanced: Manual Compilation
For more control, you can manually compile and cache the SDFG:
@native(target="openmp")def my_kernel(A, B, C, N): for i in range(N): C[i] = A[i] + B[i]
# Pre-compile with sample argumentscompiled = my_kernel.compile(A, B, C, N)
# Reuse compiled versionresult = compiled(A, B, C, N)
# Access the underlying SDFGsdfg = my_kernel.last_sdfgLicense
This component is part of docc and is published under the BSD-3-Clause license. See LICENSE for details.