GitXplorerGitXplorer
c

flashtext-kt

public
2 stars
0 forks
0 issues

Commits

List of commits on branch master.
Unverified
2a5897d369ef9004dfbed8e6d5e3206db806339d

Add methods to add data in bulk from map or list

cchiragjn committed 6 years ago
Unverified
8ae1e1648dd1c9fb03fe85f3d929b9db937a3a52

Initial commit

cchiragjn committed 6 years ago

README

The README file for this repository.

Kotlin port of flashtext

A modified version of Aho Corasick algorithm that only matches whole words instead of arbitrary substrings [1]

[1] https://arxiv.org/abs/1711.00046


See example.kt for usage till more documentation is added

Example Usage:

import flashtext.KeywordProcessor as KeywordProcessor

fun main(args: Array<String>) {
    val keywordProcessor = KeywordProcessor(caseSensitive=true)
    keywordProcessor.addKeyword("NYC", "New York")
    keywordProcessor.addKeyword("APPL", "Apple")
    keywordProcessor.addKeywordsFromMap(
        hashMapOf(
            ("java" to arrayListOf("java_2e", "java programing")),
            ("product manager" to arrayListOf("PM", "product manager"))
        )
    )
    println("Terms in Trie: ${keywordProcessor.size()}")
    println("Data: ${keywordProcessor.getAllKeywords().toString()}")

    val text: String = "I am a PM for a java_2e platform working from APPL, NYC"
    println("Text: ${text}")
    println("Extract: ${keywordProcessor.extractKeywords(text)}")
    println("Replace: ${keywordProcessor.replaceKeywords(text)}")
}

Compile:

kotlinc flashtext/TrieNode.kt flashtext/KeywordProcessor.kt example.kt -d flashtext.jar

Run the example:

kotlin -classpath flashtext.jar ExampleKt

Output

Terms in Trie: 6
Data: {APPL=Apple, java_2e=java, product manager=product manager, NYC=New York, java programing=java, PM=product manager}
Text: I am a PM for a java_2e platform working from APPL, NYC
Extract: [(value=product manager, offset=7, length=2), (value=java, offset=16, length=7), (value=Apple, offset=46, length=4), (value=New York, offset=52, length=3)]
Replace: I am a product manager for a java platform working from Apple, New York

Todo:

  • [ ] Make it into a proper package, probably usable via gradle. Get help for this
  • [ ] Write tests
  • [ ] Compute benchmarks for Kotlin Regex vs this module
  • [ ] Profile memory for a bunch of dictionaries

Disclaimer: I am a Kotlin newbie, so any idiomatic Kotlin changes are welcome.