GitXplorerGitXplorer
x

llama-2-local-ui

public
7 stars
0 forks
0 issues

Commits

List of commits on branch main.
Verified
7f3207d3e58b6429b03554ed8a706b4b129f790f

Add link to kaggle in README.md

xxhluca committed a year ago
Verified
c8c7305d6ac8cc68250ba6ca487119808425756e

Update README.md

xxhluca committed a year ago
Verified
71ce4587bef0fce0dfbf89aa3494a8b79d0e8226

Make app more concise

xxhluca committed a year ago
Unverified
b14671e1e11abc4dff18a48beee57039910d8a84

Improve app title

xxhluca committed a year ago
Verified
817f394d92eee1b0241e1f7379527088aa95bbd4

Make README.md clearer

xxhluca committed a year ago
Verified
43cec923c95aae905fe6259a63e74ffe9c0e50ec

Add readme and demo image

xxhluca committed a year ago

README

The README file for this repository.

LLaMA-2 Local Chat UI

This app lets you run LLaMA v2 locally via Gradio and Huggingface Transformers. Other demos require the Huggingface inference server or require replicate, which are hosted solutions accessible through a web API. This demo instead runs the models directly on your device (assuming you meet the requirements).

An image showing the user interface of the app

Kaggle Demo

Requirements

  • You need a GPU with at least 10GB of VRAM (more is better and ensures the model will run correctly).
  • You need to have access to the Huggingface meta-llama repositories, which you can obtain by filling out the form.
  • You need to create a Huggingface access token and add it as an environment variable called HUGGINGFACE_TOKEN, e.g. to your .bashrc.

Usage

Clone this repository:

git clone https://github.com/xhluca/llama-2-local-ui

Create a virtual environment:

Install with:

pip install -r requirements.txt

Run the app:

python app.py

Details

You can modify the content of app.py for more control. By default, it uses 4-bit inference (see blog post. It is a very simple app (~100 lines), so it should be straight forward to understand. The streaming part relies on threading and queue, but you probably won't need to worry about that unless you need to change the streaming behavior.

Acknowledgement

The app extends @yvrjsharma's original app with the addition of transformers, thread and queue.