GitXplorerGitXplorer
c

scaling-laws-for-language-transfer

public
9 stars
1 forks
1 issues

Commits

List of commits on branch main.
Verified
784c4104d9ffe96584efdaa808af05e1875edc70

Update README.md

cchristinakim committed 3 years ago
Verified
fbf62eca08aebb61d4a5d1903b7d5a3d95b945d6

Update README.md

cchristinakim committed 3 years ago
Verified
bbaac816e131973c3d58975c17dc47a166588a11

Update README.md

cchristinakim committed 3 years ago
Unverified
af3e6042537efbe072a777fe4fc6d57426ea2645

datasets

cchristinakim committed 3 years ago
Unverified
257e61198a86002fcac1f3a3468b3537426b88c0

datasets

cchristinakim committed 3 years ago
Unverified
4698d2c74a8e917e5baaa71228a20d9fdd868333

fixing readme

cchristinakim committed 3 years ago

README

The README file for this repository.

Scaling Laws for Language Transfer Learning

Code and models from the blog post Scaling Laws for Language Transfer Learning

Motivation

Building upon work from Scaling Laws for Transfer (Hernandez et. al. 2021), my experiments focused on exploring the relationships between fine-tuning on non-English languages and trying to answer the question: How much does pre-training on English help when transferring across different languages as we vary the dataset size and model size?

Usage

This repo contains the code for:

  1. Reproducing pre-trained decoder-only transformers using hyperparameters from Scaling Laws for Neural Languages but trained on OpenWebtext2 instead of WebText
  2. Reproducing language transfer experiments for pre-trained English models to Chinese, Spanish, and German texts

All English pre-trained models were trained for 26 billion tokens with no repeats:

  • x6small 3.3M non-embedding parameters
  • x5small 16M non-embedding parameters
  • x4small 39M non-embedding parameters
  • x3small 51M non-embedding parameters
  • x2small 70M non-embedding parameters
  • small 124M non-embedding parameters

Datasets