GitXplorerGitXplorer
j

tart

public
1 stars
0 forks
0 issues

Commits

List of commits on branch master.
Unverified
a042626d0e3d4942c341e309531ddb2350733bde

Update .gitignore

jjonmorton committed a year ago
Unverified
06feb9193ac547b7b797c629b74e688353f0b99b

Initial commit

jjonmorton committed a year ago

README

The README file for this repository.

Training AutoRegresive Transformers

More capable than nanoGPT, just as much fun! This is my library for experimenting with transformers. I am particularly interested in exploring byte level, tokenizer-free, heirarchical autoregressive models.

Uses flash attention v2 for maximum speeed.

Included model types:

  • gpt2: Vanilla gpt2 architecture
  • ibt (improved baseline transformer): achieves lower validation loss than vanilla gpt2 with similar compute and memory requirements, by incorporating some of the latest tricks like rotary embeddings, time shifting, geglu, improved initialization, rmsnorm, and sliding window attention.
  • hourglass (WIP): hourglass transformers for efficient character level modeling with long context window. Still working to achieve similar perf/compute/memory as above models.