GitXplorerGitXplorer
j

vaderXquery

public
0 stars
0 forks
0 issues

Commits

List of commits on branch master.
Unverified
ccc82076b81e733b613d409ec24441eaa9d4251e

Moved lookup tables and constants to own module

committed 7 years ago
Unverified
39a8cfcbd5756dd1f6a28fd18e2e739cdcf93219

small simplification

committed 7 years ago
Unverified
e1878ae418172284cc1eb5ab24b368848eca9a86

made function var more informative

committed 7 years ago
Unverified
8a32eedfd2ba431a4de02f816c337ac77c508914

Updated remove punctuation

committed 7 years ago
Unverified
afd957c594e438c89fec1b9154d90c85019a9df1

Converted PUNC_LIST to a dictionary

committed 7 years ago
Unverified
3722f1edfbcaa6bf3add599551ffa8eaec47abda

Turned NEGATE into a map.

committed 7 years ago

README

The README file for this repository.

vadarXquery

This is an XQuery implementation of vaderSentiment.

Why?

Good question. The Python implementation works well. My particular problem involved a large dataset already existing on a MarkLogic server. Given the time I had to work on the project and the number of items that needed analysis porting, this library to XQuery made sense.

Installation

This XQuery library is intended to be installed as a submodule within your project. I would suggest something along the lines of:

$ git submodule add <repo-url>

Supporting Resources

XRAY

Unit tests are written using the xray library. I recommend installing this as a submodule to the parent project.

vader-lexicon.txt

vaderSentiment contains a lexicon that it uses to determine a word's valence score. I've created a transform that will ingest the vader-lexicon and transform it into an xml document that this library uses.

This transform is intended to be called by MLCP. If you're using mlGradle, add a task like:

task importLexicon(type: com.marklogic.gradle.task.MlcpTask){
    classpath = configurations.mlcp
    command = "IMPORT"
    port = Integer.parseInt(mlXccPort)
    database = mlAppConfig.contentDatabaseName
    input_file_path = "src/main/ml-modules/root/modules/vaderXquery/*.txt"
    output_uri_replace = ".*vader, '/vader'"
    output_uri_suffix = ".xml"
    document_type = "text"
    input_file_type = "documents"
    transform_module = "/modules/vaderXquery/transform-lexicon.xqy"
    transform_namespace = "http://vaderSentiment/vader/transform"
}

You'll need to adjust accordingly if you placed this submodule elsewhere.

General Use

The general entry point for this module is polarity_scores(). This function expects a sentence, and returns a map of scores. The score you most likely care about is the compound score. Please see vaderSentiment for a full description of what this means.

Repsonse Format

polarity_scores returns a map that looks like:

map:new((
    map:entry("neg", xs:decimal()),
    map:entry("pos", xs:decimal()),
    map:entry("neu", xs:decimal()),
    map:entry("compound", xs:decimal())
))

In general, the compound score is what you care about. This is a value that ranges from -1 to +1. A negative score indicates a negative sentiment, a positive score a positive sentiment.