GitXplorerGitXplorer
f

RAG_benchmark

public
0 stars
0 forks
0 issues

Commits

List of commits on branch main.
Unverified
ba90f9ec32300ad578e7cdf183c0edf66f5c4ed3

add avx512 build instructions

CClarkChin08 committed 10 months ago
Unverified
53781d8c0dee73d01b326e48df71da015a06d642

update script

CClarkChin08 committed 10 months ago
Unverified
95cc210fcffe9ba2550b08866d7e3777034663a6

update README.md

fftian1 committed 10 months ago
Unverified
9869fdadcead6006ff54ce75678a8d53951293d6

update READE.md

fftian1 committed 10 months ago
Unverified
578cfcc6495c016bc3acb18c31c4694fee2cb206

update README

fftian1 committed 10 months ago
Unverified
46c9b3a577690ed49b2f94fbd00baf28fadd7719

add benchmark script

fftian1 committed 10 months ago

README

The README file for this repository.

RAG indexing and search performance benchmark on Intel SPR

Env prepare

The RAG indexing and search time is measured based on faiss package. The faiss default binary can't bring good performance on Intel Xeon platforms. It requires us to manual build faiss cpu version from source.

#install intel oneAPI
wget https://registrationcenter-download.intel.com/akdlm/IRC_NAS/163da6e4-56eb-4948-aba3-debcec61c064/l_BaseKit_p_2024.0.1.46_offline.sh
sudo sh ./l_BaseKit_p_2024.0.1.46_offline.sh

#source mkl related env variables
source /opt/intel/oneapi/mkl/latest/env/vars.sh

#install dependency
conda install -c conda-forge swig=4.1.1

#build faiss-cpu w/ avx512 support
git clone https://github.com/facebookresearch/faiss.git
mkdir build && cd build
cmake -DFAISS_ENABLE_GPU=OFF -DFAISS_ENABLE_PYTHON=ON -DFAISS_ENABLE_RAFT=OFF -DBUILD_TESTING=OFF -DBUILD_SHARED_LIBS=ON -DFAISS_OPT_LEVEL=avx512 -DBLA_VENDOR=Intel10_64_dyn ..
# build avx2 version
make -j faiss && make -j swigfaiss
# build avx512 version
# make -n faiss_avx512 && make -j swigfaiss_avx512
cd faiss/python/ && pip install -e .
make install

Benchmark command

KMP_BLOCKTIME=1 KMP_SETTINGS=1 KMP_AFFINITY=granularity=fine,compact,1,0 OMP_NUM_THREADS=48 LD_PRELOAD=${CONDA_PREFIX}/lib/libiomp5.so:${CONDA_PREFIX}/lib/libtcmalloc.so numactl -l -C 0-47 python benchmark.py --device cpu 

RAG indexing and search performance benchmark on Nvidia A100/H100

Env prepare

conda install -c rapidsai -c conda-forge -c nvidia rmm cuda-version=12.1  #or 11.4
conda install -c pytorch -c nvidia -c rapidsai -c conda-forge faiss-gpu-raft=1.8.0

Benchmark command

python benchmark.py --device gpu

RAG embedding generation performance benchmark on Intel SPR

Env prepare

The embedding generation time is measured based on langchain RAG API with Intel SPR 1s48c

conda install astunparse ninja pyyaml mkl mkl-include setuptools cmake cffi typing_extensions future six requests dataclasses -y
conda install jemalloc gperftools -c conda-forge -y
conda install git-lfs -y

pip install -U langchain
pip install -U langchain-community


python -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
python -m pip install intel-extension-for-pytorch oneccl_bind_pt --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/cpu/us/

pip install sentence_transformers

python -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
python -m pip install intel-extension-for-pytorch
python -m pip install oneccl_bind_pt --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/cpu/us/

pip install intel-extension-for-transformers
pip install accelerate
pip install datasets

Benchmark command

source run.sh