GitXplorerGitXplorer
j

llama-jarvis

public
3 stars
0 forks
0 issues

Commits

List of commits on branch main.
Unverified
7de19cee7b5fef1bb143151f5abc7955f2db7761

feat(tests): Add basic tests for utils and models

jjohnsutor committed 3 months ago
Unverified
f537dabf28300779d7fc9d523b35ca1206962fda

fix(model): Update ruff and fix issue

jjohnsutor committed 3 months ago
Unverified
eaafa6e306d3637021ec4ad6cfccc75694e0aa60

feat(readme): Update README with examples and roadmap

jjohnsutor committed 3 months ago
Unverified
b28e76b5219be660c6805e7ef719288676ae92f6

feat(model): Update forward method to support phase 1 and 2 labels and losses

jjohnsutor committed 3 months ago
Unverified
b50324078821b4906b63192389c117156694af8d

fix(model): Pass in speech IDs instead of embeddings

jjohnsutor committed 4 months ago
Unverified
e320b0304c4589680a9f8795281d62b55100136d

Initial commit

jjohnsutor committed 4 months ago

README

The README file for this repository.

🦙🎤 Llama-Jarvis

Lint Status Tests Status contributions welcome Python Versions PyPi

Llama Omni Train a speech-to-speech model using your own language model. Currently based on the Seamless Model, but plan to support more models in the future.

This model is based on speech-to-speech models such as Llama-Omni. However, it aims to take advantage of the joint speech-text embeddings of the Seamless Model.

This code is very much a work in progress. Any and all contributions are welcome!

Why this Library?

This library aims to make speech-to-speech models more compatible with the HuggingFace ecosystem, rather than requiring you to modify your models and datasets to work with a new library. This allows us to take advantage of things like the HuggingFace Trainer.

Getting Started

NOTE For some of the below, you may have to first log in to HuggingFace to gain access to the gated models (especially Llama models).

Installation

pip install llama-jarvis

Install Locally

git clone https://github.com/johnsutor/llama-jarvis
cd llama-jarvis 
pip install -e . 

Phase One Loss

The example code will return the phase one loss (i.e., when training the first phase of Llama-Omni)

from llama_jarvis.model import JarvisModel, JarvisConfig, JarvisProcessor

BASE_LLM = "meta-llama/Llama-3.2-1B"
SEAMLESS_MODEL = "facebook/hf-seamless-m4t-medium"
LANGUAGE = "eng"

jarvis_config = JarvisConfig(
    BASE_LLM,
    SEAMLESS_MODEL
)
jarvis_model = JarvisModel(jarvis_config)
jarvis_processor = JarvisProcessor(
    BASE_LLM,
    SEAMLESS_MODEL
)

inputs = processor(
    instruction=["You are a language model who should respond to my speech"],
    text=["What is two plus two?"],
    label=["Two plus two is four"],
    src_lang=LANGUAGE,
    return_tensors="pt",
    padding=True
)

outputs = model.forward(
    **inputs,
    tgt_lang=LANGUAGE
)

print(output.loss)

Phase One Two

The example code will return the phase two loss (i.e., when training the second phase of Llama-Omni)

from llama_jarvis.model import JarvisModel, JarvisConfig, JarvisProcessor

BASE_LLM = "meta-llama/Llama-3.2-1B"
SEAMLESS_MODEL = "facebook/hf-seamless-m4t-medium"
LANGUAGE = "eng"

jarvis_config = JarvisConfig(
    BASE_LLM,
    SEAMLESS_MODEL
)
jarvis_model = JarvisModel(jarvis_config)
jarvis_processor = JarvisProcessor(
    BASE_LLM,
    SEAMLESS_MODEL
)

inputs = processor(
    instruction=["You are a language model who should respond to my speech"],
    text=["What is two plus two?"],
    label=["Two plus two is four"],
    src_lang=LANGUAGE,
    return_tensors="pt",
    padding=True
)

outputs = model.forward(
    **inputs,
    tgt_lang=LANGUAGE,
    train_phase=2
)

print(output.loss)

Roadmap

  • [x] Release the code on PyPi
  • [ ] Train a baseline model using Llama 3.2 1B and Seamless Medium
  • [ ] Provide training example code
  • [ ] Fully document the code
  • [ ] Create an inference script for the model
  • [ ] Write thorough tests for the code (~85% coverage), and test with a multitude of open-source models

Other Cool Libraries

We take a lot of inspiration from some other nice open-source libraries out there. Shoutout to