GitXplorerGitXplorer
k

ng-video-lecture

public
3697 stars
975 forks
37 issues

Commits

List of commits on branch master.
Verified
52201428ed7b46804849dea0b3ccf0de9df1a5c3

Merge pull request #9 from Andrei-Aksionov/feature/shape_description_fix

kkarpathy committed 2 years ago
Unverified
4c8e9028cc4fe417100e1e27db077fca0e0477f2

Clarify shape descriptions inside forward method

AAndrei-Aksionov committed 2 years ago
Unverified
d38c865a0e37327925e32b7bafc97b45e7e137ca

adding this even though the first gpt video didn't cover it, because init is very important to good performance. will cover it in followup video in more detail

kkarpathy committed 2 years ago
Unverified
5301c27d52dace5699faedb3a67bc29ebfd89ea6

soften the wording a bit to not scare people too much

kkarpathy committed 2 years ago
Unverified
dfc4eceba4f7e3cdce1b17d81a3c1fa763446a5e

attach a note about init

kkarpathy committed 2 years ago
Unverified
ddf200a2dba8a2a74121cd86bf51706f28c753e3

Merge branch 'master' of https://github.com/karpathy/ng-video-lecture

kkarpathy committed 2 years ago

README

The README file for this repository.

nanogpt-lecture

Code created in the Neural Networks: Zero To Hero video lecture series, specifically on the first lecture on nanoGPT. Publishing here as a Github repo so people can easily hack it, walk through the git log history of it, etc.

NOTE: sadly I did not go too much into model initialization in the video lecture, but it is quite important for good performance. The current code will train and work fine, but its convergence is slower because it starts off in a not great spot in the weight space. Please see nanoGPT model.py for # init all weights comment, and especially how it calls the _init_weights function. Even more sadly, the code in this repo is a bit different in how it names and stores the various modules, so it's not possible to directly copy paste this code here. My current plan is to publish a supplementary video lecture and cover these parts, then I will also push the exact code changes to this repo. For now I'm keeping it as is so it is almost exactly what we actually covered in the video.

License

MIT