GitXplorerGitXplorer
b

unicode-graphemes

public
9 stars
3 forks
0 issues

Commits

List of commits on branch master.
Unverified
a39333d52204e5e35e75ef0a90bbccc65867e656

Initial revision

bbhamiltoncx committed 8 years ago
Unverified
2ba265320e8f51ccc8e38c4e4c85ee65fc72e228

More gitignore

bbhamiltoncx committed 8 years ago
Unverified
e258ba61a0bea721d88fa34f9a33220550d57f71

Initial commit

bbhamiltoncx committed 8 years ago

README

The README file for this repository.

README for unicode-graphemes

This is a sample Java client for the ANTLR 4 Unicode grapheme cluster parser grammar:

https://github.com/antlr/grammars-v4/tree/master/unicode/graphemes

To Build and Install

% mvn install

Usage Example

import com.github.bhamiltoncx.UnicodeGraphemeParsing;

public class Example {
  public static void main(String[] strings) {
    for (String string : strings) {
      System.out.format("Parsing string: %s\n", string);
      for (UnicodeGraphemeParsing.Result grapheme : UnicodeGraphemeParsing.parse(string)) {
        String s = string.substring(grapheme.stringOffset, grapheme.stringOffset + grapheme.stringLength);
        String type = (grapheme.type == UnicodeGraphemeParsing.Result.Type.EMOJI ? "Emoji" : "Non-Emoji");
        System.out.format("%s: [%s] (offset=%d, length=%d)\n", type, s, grapheme.stringOffset, grapheme.stringLength);
      }
    }
  }
}

Full example

% mvn install
% javac -cp target/unicode-graphemes-0.1-SNAPSHOT.jar example/Example.java
% alias parse-graphemes="java -cp $HOME/.m2/repository/org/antlr/antlr4-runtime/4.7/antlr4-runtime-4.7.jar:$HOME/.m2//repository/com/github/bhamiltoncx/unicode-graphemes/0.1-SNAPSHOT/unicode-graphemes-0.1-SNAPSHOT.jar:example Example"
% parse-graphemes abc๐Ÿ˜€๐Ÿ’ฉ๐Ÿ‘ฎ๐Ÿฟโ€โ™€๏ธ๐Ÿ‘ฉโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ง๐Ÿ‡ซ๐Ÿ‡ฎ
Parsing string: abc๐Ÿ˜€๐Ÿ’ฉ๐Ÿ‘ฎ๐Ÿฟโ€โ™€๏ธ๐Ÿ‘ฉโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ง๐Ÿ‡ซ๐Ÿ‡ฎ
Non-Emoji: [a] (offset=0, length=1)
Non-Emoji: [b] (offset=1, length=1)
Non-Emoji: [c] (offset=2, length=1)
Emoji: [๐Ÿ˜€] (offset=3, length=2)
Emoji: [๐Ÿ’ฉ] (offset=5, length=2)
Emoji: [๐Ÿ‘ฎ๐Ÿฟโ€โ™€๏ธ] (offset=7, length=7)
Emoji: [๐Ÿ‘ฉโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ง] (offset=14, length=11)
Emoji: [๐Ÿ‡ซ๐Ÿ‡ฎ] (offset=25, length=4)