tma-micro-bench

public

0 stars

1 forks

1 issues

Commits

List of commits on branch master.

Verified

f7944c74dd0ca7766ff760cb6226134ee1571fb6

Update README.md

eembg committed 6 months ago

Verified

42466972d880f8992789393a144c75e1b340908a

Update README.md

eembg committed 6 months ago

Unverified

b11e8bb9960617ff400cd5aeb8151c14b5e3941c

remove triton dependency from test.py

eembg committed 6 months ago

Verified

16caa493569673a6e0bf9a868be4bc3b26506486

Update README.md

eembg committed 6 months ago

Unverified

8f315e4d4ab8026257f541cbb28decc8d12ad1df

update readme and denoise script

eembg committed 6 months ago

Unverified

be52631599fb2a38c288ecafa290c2f04ceb110a

TMA descriptor micro-benchmarks

eembg committed 6 months ago

README

The README file for this repository.

tma-micro-bench

sudo dnf install cuda-toolkit-12-4
export CUDA_HOME=/usr/local/cuda-12.4/
export TORCH_CUDA_ARCH_LIST="9.0a"
python setup.py develop
python test.py
python benchmark.py

Note: test.py is currently broken so that NVIDIA folks can repro the bug I'm seeing with cpfence. benchmark.py does not use the broken function. Please comment the last line of test.py if you just want to make pretty charts.