GitXplorerGitXplorer
j

gpt-3-for-pascal

public
12 stars
0 forks
0 issues

Commits

List of commits on branch main.
Verified
5e7a0b55f1e25f42e56852e43faafb34d242dc81

Update README.md

jjoaopauloschuler committed 3 months ago
Verified
7c06e78aa4b607fcb698fcb439fe335a26fa6e0d

Add files via upload

jjoaopauloschuler committed 3 months ago
Verified
ba63ff5538f6e35c33b9fbf571024f009983dcc0

Update gpt-3-for-pascal-v1.ipynb

jjoaopauloschuler committed 4 months ago
Verified
0438590bad3be13214fb64691f7d161585266825

Update README.md

jjoaopauloschuler committed 4 months ago
Verified
487416eaa33f0c199cb4bad9e3fdb3e311cf8f1f

Update README.md

jjoaopauloschuler committed 5 months ago
Verified
6d6325b1d322db76ba65cd9be7a75a76cb6b223b

Update README.md

jjoaopauloschuler committed 5 months ago

README

The README file for this repository.

gpt-3-for-pascal

The paper Language Models are Few-Shot Learners shows a neural network architecture based on a stack of transformer decoder modules. The table 2.1 from this paper shows a number of GPT-3 based architectures:

As per table above, the GPT-3 Small is composed by:

  • 12 transformer decoders,
  • 12 heads,
  • 768 hidden dimensions (64 hidden dimensions per head) and
  • 3072 intermediate dimensions (768 hidden dimensions times 4).

With CAI Neural API, a neural network model similar to GPT-3 Small can be implemented with:

  var
    CntLayer: integer;
  begin
    Result := THistoricalNets.Create();
    Result.AddLayer([
      TNNetInput.Create(pContextSize, 1, 1),
      TNNetTokenAndPositionalEmbedding.Create(pVocabSize, pEmbedDim),
      TNNetPointwiseConvLinear.Create({hidden dimensions=}768),
      TNNetSignedSquareRoot1.Create()
    ]);
    for CntLayer := 1 to {Layers=}12 do
    begin
      Result.AddTransformerBlockCAI( {Heads=}12, {intermediate dimensions=}4*768, {NoForward=}true, {HasNorm=}true, false);
    end;
    Result.AddLayer([
      TNNetPointwiseConvLinear.Create(pVocabSize, 1),
      TNNetPointwiseSoftMax.Create(1)
    ]);
  end;

At this point in time, there is no efficient GPU implementation for pascal. Although the model name is GPT-3 Small, training this model requires large amounts of RAM and CPU power. In the case that you would like to see the magic (training) happen in front of your eyes with the Tiny Stories Dataset, this model can be run very slowly on google colab High RAM CPU based environment: GPT-3 Small for Pascal Open In Colab.

Although it is called GPT-3 Small, training it may be too resource demanding for a quick experiment. Simpler NLP examples are more practical for average computers.