GitXplorerGitXplorer
g

py-phrasematcher

public
5 stars
1 forks
0 issues

Commits

List of commits on branch master.
Unverified
83b0fc13a851d9d083bc3eb1b8697aec08693df2

Update README.md

ggeovedi committed 8 years ago
Unverified
83388ba415228722ca997b52e8aa5d1ca07fa46d

Update phrasematcher.py

ggeovedi committed 8 years ago
Unverified
b40b7022206244cf41ef862d192fb6740cb37d79

Update README.md

ggeovedi committed 8 years ago
Unverified
db168ae2e31815d773ed4007f8f93897337792fb

using sortedcontainers

ggeovedi committed 8 years ago
Unverified
f757d681d5933f00fe504ce114c677da89c56840

Update phrasematcher.py

ggeovedi committed 8 years ago
Unverified
e7cab1a71aed4fd5e9496f68d892c73730314916

Update phrasematcher.py

ggeovedi committed 8 years ago

README

The README file for this repository.

py-phrasematcher

Fast and resource-friendly Python phrase matcher.

Requirements

  • sortedcontainers

Usage

It takes a plain pattern file as input.

sepak bola
pencetak gol terbanyak
sir bobby charlton
bobby charlton
musim lalu
musim ini
satu di antara
kesalahan defensif
kesalahan defensif terbesar
...

Initial usage

from phrasematcher import PhraseMatcher

matcher = PhraseMatcher('model_dirname', pattern_file='patterns.txt')

text = '''menurut analisa squawka , mu adalah satu di antara lima kesebelasan
          dengan kesalahan defensif terbesar di epl musim lalu -- walau hanya
          tiga gol yang masuk ke gawang mereka dari sejumlah kesalahan itu .'''

for match in matcher.match(text):
    print(match)

Reusing database

matcher = PhraseMatcher('model_dirname')

Why?

Short answer: I'm bored.

Long answer: Doing n-gram lookups is a waste of time and resources. Here we will reject candidates with OOV, lookup only first and last tokens and then check if the candidate pattern is in the hashtable.