GitXplorerGitXplorer
u

JARVIS

public
0 stars
0 forks
0 issues

Commits

List of commits on branch master.
Unverified
ee2ce8c2f71b74e50c5a1bdc907988d075b69399

Visualization friendly change using scopes,+ *custom created to model Kim's network exactly

uultrons committed 9 years ago
Unverified
7733950baabbf04d21b7c3cc2127b67a84a73f5f

Visualization friendly change using scopes,+ *custom created to model Kim's network exactly

uultrons committed 9 years ago
Unverified
12c7b5d07607e0b3be532427339d11917f20efe9

Visualization friendly change using scopes,+ *custom created to model Kim's network exactly

uultrons committed 9 years ago
Unverified
5562a80c7c6c9c25c024f328623d6426c505b16f

Visualization friendly change using scopes,+ *custom created to model Kim's network exactly

uultrons committed 9 years ago
Unverified
06bc56b82a4d61f45a7cc341e95be10ab5a2948f

Tensor Board Related Changes

uultrons committed 9 years ago
Unverified
a82efc4c8273055b9a090c762e205b22e3281e9c

Tensor Board Related Changes

uultrons committed 9 years ago

README

The README file for this repository.

#Semantics Based Sentence Classification ######################################## ProblemStatement:

Perform an objective comparison of two prominent methods in context of Semantic Based Sentence Classification: [2] “Tree structured Long Short Term Memory Networks” and [3] “Convolutional Network for Sentence Classification”. We also plan to extend multi channel design (static and task based vector usage) utilized in the later work [3], to explore Compound Multi­Channel Neural Networks (parallely trained vectors from two different neural architectures). Input/Output Behaviour:

Stanford Sentiment Treebank classifies sentiments in 5 classes using the following cut­offs: (0, 0.2], (0.2, 0.4], (0.4, 0.6], (0.6, 0.8], (0.8, 1.0], for very negative, negative, neutral, positive, very positive, respectively. ** Sentence

  1. the worst movie of 2002
  2. (Allen's) been making piffle for a long while
  3. Walt Becker 's film pushes all the demographically appropriate comic buttons Sentiment Sentiment Label Class 0.013889 1 0.30556 2 0.80556 5 Evaluation Metrics:

Test set accuracies in Stanford Sentiment Treebank [1] Data Set:

We plan to use use Stanford Sentiment Treebank [1]. There are two subtasks:

  1. Binary classification of sentences. We plan to use the standard train/dev/test splits of 6920/872/1821
  2. Fine grained classification over five classes: very negative, negative, neutral, positive, and very positive. We plan to use the standard train/dev/test splits of 8544/1101/2210 Baseline: Binary classification: 82.4% test set accuracy [2] Fine grained classification: 43.2% test set accuracy [2] Oracle:

By definition of oracle, test set accuracy should be 100% (human level comprehension). So far, the best test set accuracy achieved by any model is [2] “Tree structured Long Short Term Memory Networks” which is 86.9% on binary classification and 50.6% on fine grained classification. Challenges: Sentence classification is an inherently difficult problem to solve because there are not perfect model to grasp human level comprehension of language. Traditional methods like bag of words fail to generalize because meaning of a sentence is order sensitive (e.g., “cats climb trees” vs. “trees climb cats”). Order sensitive models on the other hand have limited scalability. Larger window of the phrase makes the model harder to train using conventional methods. Fully connected neural architecture is hard to train and require a very large data set due to inherently large number of parameters. Traditional RNN suffer from vanishing­exploding gradient problems. Recursive neural networks require parse tree structure information. Convolutional neural network and Long Short Term tree based RNN methods which we explore in depth, address some of these issues. Yet accuracy reported by these works clearly indicates a scope of further improvement. In the current work we go into depth of these issues and present the essence of techniques, deriving insights and if possible try to improve the results. Related Work:

[1] Socher, Richard, Alex Perelygin, Jean Y Wu, Jason Chuang, Christopher D Manning, Andrew Y Ng, and Christopher Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In ​Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). [2] Kai Sheng Tai, Richard Socher, and Christopher D Manning. Improved semantic representations from tree­structured long short­term memory networks. ACL, 2015. [3] Yun Kim, Convolutional Neural Networks for Sentence Classification. EMNLP,2014. [4] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. ​Efficient Estimation of Word Representations in Vector Space. In Proceedings of Workshop at ICLR, 2013. [5] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. ​Distributed Representations of Words and Phrases and their Compositionality. In Proceedings of NIPS, 2013 ** Described in Stanford Sentiment Treebank Dataset Documentation. (http://nlp.stanford.edu/sentiment/index.html)