GitXplorerGitXplorer
m

pire

public
1 stars
0 forks
7 issues

Commits

List of commits on branch main.
Verified
43c5d4b4005a0de07d9f5f75870c20542684c938

More details on README.md

mmert-kurttutan committed 3 days ago
Unverified
f4ca84f7f206795bd32cbaf3d2f601f7b49b8aa3

Initial readme

mmert-kurttutan committed 3 days ago
Unverified
4015ca2179c0ecd461a608744ccda3c5d54edde2

Fix tranpose bug, start new mathfun

mmert-kurttutan committed 13 days ago
Unverified
a2cee5d7895f37fcfb2072ca2e20eb4d15b594ce

Prefetch b to i8 i16 gemm

mmert-kurttutan committed 25 days ago
Unverified
c02b25acf668cad8d304fd0127350b5de3d2cfba

Update default cblas_path

mmert-kurttutan committed 25 days ago
Verified
d86f3fe31490e60b6c8cbcf4ebcfe85564623ef8

Uv (#48)

mmert-kurttutan committed 25 days ago

README

The README file for this repository.

Pire: Package for high performance cpu kernels

State of the art gemm-like kernels for cpus (using inline assembly)

Include quantized gemm, sgemm, hgemm, dgemm, integer gemm.

Working on putting more kernels used in many LLMs.

Features:

  • packed api for matrices
  • gemm+unary function fusion

State of the Art:

  • The same performance as MKL within 1% performance, you can check benchmark directory
  • Needed some optimization for layout with tn, nt (dot kernel in blis terminology)

Why this, not blis, or openblas?

  • I wanted to write somehting on my own so I can explore the path towards state of the art faster. I also wanted to write something in Rust since Rust had two features I liked very much, run time dispatch features target with #[target_feature(enable = "avx,avx2,fma")] and its inline assembly combined with its macro system seemed very convenient to write hand optimized assembly code for gemm kernels.
  • Those projects also dont offer several features I wanted: packed interface, unary function fusion, integer gemm, inline assembly for llm quantized gemm kernels (to be worked)

Another way to look at this project is to have collection most optimized cpu kernels for their respective gemm ops so that it can be used as a reference for high performance numerical jit kernels.