GitXplorerGitXplorer
g

cudaz

public
55 stars
2 forks
0 issues

Commits

List of commits on branch main.
Unverified
68638782c52035c572af61db346082732ffb7014

fix hw5

ggwenzek committed 2 years ago
Unverified
51b764711c49388132138d1d5300209bf786a12f

fix lesson3

ggwenzek committed 2 years ago
Unverified
298953de19bb60fcdf3d29a886184570f6303f96

fix hw4

ggwenzek committed 2 years ago
Unverified
76d40b3249f3594d904c40888e3087d2bbaabe17

fix hw3

ggwenzek committed 2 years ago
Unverified
e9fc1e7af959d36494964f4124bf30115113ce21

introduce CudaKernel/ZigKernel, fix hw2

ggwenzek committed 2 years ago
Unverified
ae40471b0e9305866aad4d83b306538dbf785f4a

fix hw1

ggwenzek committed 2 years ago

README

The README file for this repository.

Cudaz

Overview

The main motivation for this project was to complete the assignment of Intro to Parallel Programming using as little C++ as possible.

The class is meant to use Cuda. Cuda is a superset of C++ with custom annotation to distinguish between device (GPU) functions and host (CPU) functions. They also have special variables for GPU thread IDs and special syntax to schedule a GPU function.

You're supposed to compile this Cuda code using nvcc NVidia proprietary compiler. But Cuda also has a C api that you can call easily in Zig. And you can also load device code using a the PTX "assembly" format. This assembly can be produced by nvptx itself, allowing you to write only the GPU code in C, compile it with nvcc and load it from your Zig code. Since Zig can parse the C code for GPU, it knows the signature of your device code and can properly call them.

The second, more experimental, way is to generate the PTX using LLVM through Zig stage 2. That way you can write both host and device code in Zig.

Project structure

This repo is divided in several parts:

  • cudaz folder contains the "library" code
  • CS344 contains code for all the lesson and homework. Typically code is divided in two files. Host code: hw1.zig Device code: hw1.cu or hw1_kernel.zig
  • lodepng is a dependency to read/write images.
    • Run git submodule init; git submodule update to fetch it.

Using Zig to drive the GPU

A lot of the magic happens in build.zig and notably in addCudaz function. Generally we assume one executable will only have one .ptx. This is actually important because we need to cImport at the same time cuda.h and your device code. I don't think it's a huge constraint since you can include several files in your main device code file.

The main gotchas is that the .cu code must be C not C++. To help with this you can include cuda_helpers.h that defines a few helpers like min/max. You also need to disable name mangling by wrapping your full device code with:

#ifdef __cplusplus
extern "C" {
#endif

...

#ifdef __cplusplus
}
#endif

The #ifdef __cplusplus is unfortunately needed because the extern "C" will trip up the Zig C-parser.

I recommend looking at the examples to learn more about how the API work. And also taking the full class :-)

To use block-shared memory (__shared keyword in Cuda) you'll need to use the SHARED macro defined in cuda_helpers.h.

The main issue with the Cuda API, is that most operation will use the default context and default GPU. This make it a bit awkward if you need to write code to drive two GPUs, because you'll need to call cuContextPush/cuContextPop every time you want to talk to the other GPU. I haven't tried to fix this in the Zig wrapper which is just a wrapper with some utility function (also my laptop has only one GPU).

Using Zig to write device code

For this I'm using stage2 compiler. Zig stage1 can theoretically target the PTX platform too but it's seems to be broken in 0.9 dev versions. (nvptx-cuda platform is Tier4 of support which means "unsupported by Zig but LLVM has the flag, so maybe it will work") I was able to use a light fork of Zig stage2 to generate a .ptx though without having to do any crazy stuff. More details and pointers can be found on this "documentation issue"