GitXplorerGitXplorer
A

ScalableGen

public
0 stars
0 forks
0 issues

Commits

List of commits on branch main.
Verified
624443f2f675ae7b7ef7e049e7530b8777cf4c3c

Update README.md

AAugust-murr committed 6 months ago
Verified
7ef17e6dec193d0c3e85de8fb8eb6bafc331239d

Update README.md

AAugust-murr committed 6 months ago
Verified
1c0b18a238b039d6fa8f1c2637aeca0752ef067d

Update README.md

AAugust-murr committed 6 months ago
Verified
48d6e08d050581ea8ac37deb6a4ff7c873c58272

Update README.md

AAugust-murr committed 6 months ago
Verified
c834debe9be48c6a4ac9c58b68a3db56732c899d

Update README.md

AAugust-murr committed 6 months ago
Verified
ea3923b66c3ade8b59ce54dedac4ef0e3650a89b

Create README.md

AAugust-murr committed 6 months ago

README

The README file for this repository.

Scalable GSM8K Response Generator

This repository facilitates the use of multiple free Kaggle 2xT4 GPU instances connected to a central MongoDB database (free tier) to generate and refine responses to tasks such as GSM8K math questions. The generated responses can be used to create synthetic training data or benchmark a model on GSM8K.

Features

  • Parallel Model Execution: Utilizes two GPUs per instance to run two copies of the model in parallel, maximizing data generation efficiency.
  • Batch Processing: Processes data in batches to further enhance speed.
  • Scalable Architecture: Connects multiple Kaggle instances to a central MongoDB database to increase the speed of data generation.

Usage

File Descriptions

  • one_shot_prompt.txt: A text file used to extract the answer to a math question as a number for evaluation.
  • retry_prompt.txt: A prompt file to instruct the model to review its previous answer (if incorrect) and attempt to answer again.
  • requirements.txt: Lists necessary libraries, most of which are pre-installed on Kaggle instances.

Main Script Capabilities

  • MongoDB Table and Collection Creation: Creates tables and collections with the GSM8K train and test sets for response generation and evaluation.
  • Retry Table Creation: Generates a "retry" table for refining and making second attempts at answering.
  • PEFT Model Integration: Wraps the base model in a PEFT model to evaluate and generate responses with fine-tuned models.

Future Development

This framework is an ongoing project and will be continuously developed to include more use cases. There are numerous methods to generate responses and prompt an LLM (Large Language Model) to refine, reattempt, and explore questions or tasks. Contributions and suggestions are welcome!