GitXplorerGitXplorer
c

markov-impersonator

public
0 stars
0 forks
2 issues

Commits

List of commits on branch main.
Unverified
96bacdffa2910cbe767bba48f410a8104a984933

explain ntlk data download and nix setup in README for now

committed 4 years ago
Verified
614b90857b6f55e8e02da6d07a3cd02dcc1bf86a

Merge pull request #2 from combinatorist/feature/grammar-aware

ccombinatorist committed 5 years ago
Unverified
087c04ed7f346ab6fa14291a5f0a2f216da2650b

use nltk to ensure parts of speech are valid

committed 5 years ago
Unverified
c81720f131f4ffdaf018d901e1be0c557ca69a06

remove redundant def of compress

committed 5 years ago
Unverified
feca6b1f30a12f02873983ae0e4ac23f7dfa845c

add useage instructions to README

committed 5 years ago
Unverified
f4dd00e31e2645220c084d706b19c921f0202427

merge local code with github license an gitignore

committed 5 years ago

README

The README file for this repository.

markov-impersonator

prequisites

This requires nltk module, which separately downloads data.

It downloaded the following (in the home directory):

nltk_data/
├── corpora
│   └── treebank
│       ├── combined
│       ├── parsed
│       ├── raw
│       └── tagged
├── taggers
│   └── averaged_perceptron_tagger
└── tokenizers
    └── punkt
        └── PY3

I believe the 3rd level items need to be installed like so:

import nltk;
nltk.download(['treebank', 'averaged_perceptron_tagger', 'punkt'])

A custom nix experssion could probably do this data download. Perhaps someone else will include this soon: https://github.com/NixOS/nixpkgs/issues/56094#issuecomment-638247266.

usage

Start a Nix shell with:

nix-shell -p 'python35.withPackages(ps: with ps; [ nltk ])'

The run the following inside:

python3 __init__.py <your input file path>

todo

  • ignore case (at least in analysis)
  • consider frequency of suffixes (or rather compress on them)
  • consider grammar tree: https://www.nltk.org/