GitXplorerGitXplorer
e

tma-micro-bench

public
0 stars
1 forks
1 issues

Commits

List of commits on branch master.
Verified
f7944c74dd0ca7766ff760cb6226134ee1571fb6

Update README.md

eembg committed 6 months ago
Verified
42466972d880f8992789393a144c75e1b340908a

Update README.md

eembg committed 6 months ago
Unverified
b11e8bb9960617ff400cd5aeb8151c14b5e3941c

remove triton dependency from test.py

eembg committed 6 months ago
Verified
16caa493569673a6e0bf9a868be4bc3b26506486

Update README.md

eembg committed 6 months ago
Unverified
8f315e4d4ab8026257f541cbb28decc8d12ad1df

update readme and denoise script

eembg committed 6 months ago
Unverified
be52631599fb2a38c288ecafa290c2f04ceb110a

TMA descriptor micro-benchmarks

eembg committed 6 months ago

README

The README file for this repository.

tma-micro-bench

sudo dnf install cuda-toolkit-12-4
export CUDA_HOME=/usr/local/cuda-12.4/
export TORCH_CUDA_ARCH_LIST="9.0a"
python setup.py develop
python test.py
python benchmark.py

Note: test.py is currently broken so that NVIDIA folks can repro the bug I'm seeing with cpfence. benchmark.py does not use the broken function. Please comment the last line of test.py if you just want to make pretty charts.

graph