mistral-hackathon-finetuning

Mistral Finetuning Hackathon 2024

For instructions on running the solution, click here.

Alplex: An AI-based Virtual Law Office

Introducing Alplex, an AI-powered virtual law office designed to assist you with legal issues based on Swiss laws.

Key Features

AI Legal Assistant - Dona:
- Clarification & Summarization: Receive your case and help summarize it.
- Technology: Powered by an Autogen Conversable Agent and a fine-tuned Mistral 7B model.
AI Paralegal - Rachel:
- Case Classification: Classifies your case into the correct legal category.
- RAG over Swiss Laws: Uses a large Mistral model to perform Retrieval-Augmented Generation over relevant Swiss laws.

Application Interface

Fine-tuning with Mistral API

We leveraged the Mistral fine-tuning API for two critical aspects:

Improving Dona: Enhanced guardrails and distilled from larger models (notebooks/04_dona_finetuning.ipynb)
Better Case Classification: Optimized classification accuracy for legal cases. (notebooks/05_classification_finetuning.ipynb)

Solution Diagram

Finetuning Usage

Fine-tuning for Dona

Goals

Robust Client Interaction:
- Good resilience against prompt hacking.
- Created a dataset with a mix of legitimate replies and placeholders for prompt hacking scenarios.
Enhanced Responses:
- Distilled from larger models to improve response quality.
- Used GPT-4o outputs to inspire the Mistral 7B model for better summaries.
Cost and Performance Efficiency:
- Autogen agent requiring multiple interactions.
- Fine-tuned smaller model for efficiency and scalability.

Fine-tuning for Classification

We prepared a dataset of legal cases categorized under Civil, Public, or Criminal law and evaluated various models:

Baseline: Traditional ML (TFIDF+LGBM).
Mistral 7B: Prompting only.
Mistral 7B (Fine-tuned): Significant performance improvement, reduced hallucinations.

Classification Results (Fold 0 of Stratified 5-Fold CV)

TFIDF+LGBM: Accuracy 0.86
Mistral 7B: Accuracy 0.55
Mistral 7B (Fine-tuned): Accuracy 0.71

Limitations

Supports only Swiss Federal Laws.
Handles only Civil, Public, or Criminal law cases.
Case classification could be improved (class imbalance).
The agentic RAG (Rachel) could make several iteration to improve the final answer.

How to Run

git clone git@github.com:unit8co/mistral-hackathon-finetuning.git
cd mistral-hackathon-finetuning

# Ensure you have Python 3.11+ and Node.js + npm (tested with Node v22.1.0, npm 10.7.0) for the frontend.

# Install necessary assets:
# download chroma.zip at https://mistral-finetuning-hackathon-2024.s3.eu-central-1.amazonaws.com/chroma.zip
# move it into the root of the repository
# unzip it in the root of the repo

# Create a virtual environment
python -m venv .venv

# Install dependencies
pip install -r requirements.txt

# Create a .env file and enter your Mistral API key
cp .env.template .env

# Start the backend
PYTHONPATH=$(pwd) python src/backend/main.py

# In another terminal, navigate to the frontend folder and run the frontend
cd src/frontend
# Install Node.js dependencies
npm install
# Run the frontend
npm run dev

# Follow the localhost URL displayed to start interacting with Dona and Rachel.

mistral-finetuning-hackathon

Commits

Update README.md

fix: refactoring, reorganizing the files

fix: improve readme

Merge branch 'main' of https://github.com/unit8co/mistral-hackathon-finetuning

fix: raw law location