GitXplorerGitXplorer
S

Stochastic_LatentDirchletAnalysis

public
7 stars
1 forks
0 issues

Commits

List of commits on branch master.
Verified
9f859656d9dcb7c5c027cd1ce6dd90821f84c032

Update README.md

SSourangshuGhosh committed 4 years ago
Verified
ad290302c06ae90e13ec0bbe62cdb3084e360b15

Update Stochastic LDA Algorithm

SSourangshuGhosh committed 4 years ago
Verified
eccd247531612be5742f11b5e1d57228202aa85d

Delete stochastic_lda.py

SSourangshuGhosh committed 4 years ago
Verified
c9a7b4e01e1c0538ca65d3163840859eabb6cb11

change to hdp

SSourangshuGhosh committed 4 years ago
Verified
5241a5c04b41df48524f72cd55a5865511d59249

change to hdp

SSourangshuGhosh committed 4 years ago
Verified
a5f07d0d71c70ff980366f0165429476d29dad88

Add files via upload

SSourangshuGhosh committed 4 years ago

README

The README file for this repository.

Stochastic Variational Inference for Latent Dirichlet Allocation

Code structure from the OnlineVB code provided by Matthew D. Hoffman (mdhoffma@cs.princeton.edu) and the algorithm is as described in Hoffman's paper below

Based on the following papers:

###Also aiming to implement SVI for HDP as described in the second paper above, work in progress

###How to Use See 'Help' using python stochastic_lda.py -h

You will need:

  • A file [dictionary.csv] containing your vocabular
  • A file [doclist.txt] containing the list of documents in the directory that you want to sample from
  • At the moment your documents can be just a normal txt file, no pre-processing required

For classwork, work in progress...

  • [x] Basic initial implementation
  • [x] Debug for common corpus
  • [x] Support Command-Line Usage for user-defined test mode and normal mode
  • [x] Run on own data
  • [ ] Implement HDP