GitXplorerGitXplorer
L

torchessian

public
12 stars
1 forks
0 issues

Commits

List of commits on branch master.
Verified
e59046d759a4d1fc70c7c6aae62d7b8884023d2d

Update README.md

LLeviViana committed 5 years ago
Verified
33babf7f08313fef4965c49843f677298249e2f6

Update README.md

LLeviViana committed 5 years ago
Unverified
240e237cc94f8f51d798aa6740d3df0cbee827c9

Less notebooks

LLeviViana committed 5 years ago
Unverified
9d9599ff2d5ba2dea1df318e6c61803fc983672f

Removing tutorial notebooks

LLeviViana committed 5 years ago
Unverified
aa8f9bf443f34230a895ff25db5360037765cd3a

Adding README

LLeviViana committed 5 years ago
Unverified
7ee0d5217a1878dd4ae19e95313624600e2e7102

Adding some documentation

LLeviViana committed 5 years ago

README

The README file for this repository.

Torchessian

This repository aims to provide a tool for analyzing the full Hessian of the loss function for neural networks. For the moment, only a single GPU-mode is enabled, but I have plans to implement a distributed version in the future.

Motivation

I found this article very interesting, and I wanted to reproduce the results very quickly. I've implemented a batch mode spectrum estimation in order to get even faster results.

For instance, when analyzing the impact of having batchnom layers in a ResNet-18 architecture, I found the following spectrum on the test set of CIFAR-10:

Note: both architectures (i.e. with and without batchnorm) were trained until almost reaching a global optimum, i.e. at least 98% of accuracy.

Spectrum of a single batch

alt text

Spectrum of the entire test dataset

alt text

The results are pretty similar, and the conclusion is the same: the batchnorm layers seem to eliminate big positive eigenvalues, which makes the training process easier.