GitXplorerGitXplorer
f

duckling

public
4057 stars
722 forks
134 issues

Commits

List of commits on branch main.
Unverified
7520daaeba28691cda8e1b5c3d946028a28fb64b

Add missing copyright header + auto-format NumParser.hs

ppatapizza committed 2 years ago
Unverified
9509e042dc68aeff7b5f80e1a38a4398bdbe4aaa

DE-Numeral-complex-German-numerals (#699)

aandhai committed 2 years ago
Unverified
1faab00741b189b8ae985774b965adfd6e88cdc2

Update Dockerfile to resolve #671 (#685)

ddfuchss committed 3 years ago
Unverified
881d189d4b3e44baa246053d48acd23b08f56d5f

docs: add GH button in support of Ukraine (#687)

ddmitryvinn committed 3 years ago
Unverified
578f4b31cd501af65d22995770a5c160e11eda21

Numeral/DE: add very large german numbers (#682)

bboonto committed 3 years ago
Unverified
ea8a4f6d3b5bac8ffed1ee339126c2fdb375ded5

Numeral/DE: fix eine not being recognized (#684)

bboonto committed 3 years ago

README

The README file for this repository.

Duckling Logo

Duckling Support Ukraine Build Status

Duckling is a Haskell library that parses text into structured data.

"the first Tuesday of October"
=> {"value":"2017-10-03T00:00:00.000-07:00","grain":"day"}

Requirements

A Haskell environment is required. We recommend using stack.

On Linux and MacOS you'll need to install PCRE development headers. On Linux, use your package manager to install them. On MacOS, the easiest way to install them is with Homebrew:

brew install pcre

If that doesn't help, try running brew doctor and fix the issues it finds.

Quickstart

To compile and run the binary:

stack build
stack exec duckling-example-exe

The first time you run it, it will download all required packages.

This runs a basic HTTP server. Example request:

curl -XPOST http://0.0.0.0:8000/parse --data 'locale=en_GB&text=tomorrow at eight'

In the example application, all dimensions are enabled by default. Provide the parameter dims to specify which ones you want. Examples:

Identify credit card numbers only:
$ curl -XPOST http://0.0.0.0:8000/parse --data 'locale=en_US&text="4111-1111-1111-1111"&dims="["credit-card-number"]"'
If you want multiple dimensions, comma-separate them in the array:
$ curl -XPOST http://0.0.0.0:8000/parse --data 'locale=en_US&text="3 cups of sugar"&dims="["quantity","numeral"]"'

See exe/ExampleMain.hs for an example on how to integrate Duckling in your project. If your backend doesn't run Haskell or if you don't want to spin your own Duckling server, you can directly use wit.ai's built-in entities.

Supported dimensions

Duckling supports many languages, but most don't support all dimensions yet (we need your help!). Please look into this directory for language-specific support.

Dimension Example input Example value output
AmountOfMoney "42€" {"value":42,"type":"value","unit":"EUR"}
CreditCardNumber "4111-1111-1111-1111" {"value":"4111111111111111","issuer":"visa"}
Distance "6 miles" {"value":6,"type":"value","unit":"mile"}
Duration "3 mins" {"value":3,"minute":3,"unit":"minute","normalized":{"value":180,"unit":"second"}}
Email "duckling-team@fb.com" {"value":"duckling-team@fb.com"}
Numeral "eighty eight" {"value":88,"type":"value"}
Ordinal "33rd" {"value":33,"type":"value"}
PhoneNumber "+1 (650) 123-4567" {"value":"(+1) 6501234567"}
Quantity "3 cups of sugar" {"value":3,"type":"value","product":"sugar","unit":"cup"}
Temperature "80F" {"value":80,"type":"value","unit":"fahrenheit"}
Time "today at 9am" {"values":[{"value":"2016-12-14T09:00:00.000-08:00","grain":"hour","type":"value"}],"value":"2016-12-14T09:00:00.000-08:00","grain":"hour","type":"value"}
Url "https://api.wit.ai/message?q=hi" {"value":"https://api.wit.ai/message?q=hi","domain":"api.wit.ai"}
Volume "4 gallons" {"value":4,"type":"value","unit":"gallon"}

Custom dimensions are also supported.

Extending Duckling

To regenerate the classifiers and run the test suite:

stack build :duckling-regen-exe && stack exec duckling-regen-exe && stack test

It's important to regenerate the classifiers after updating the code and before running the test suite.

To extend Duckling's support for a dimension in a given language, typically 4 files need to be updated:

  • Duckling/<Dimension>/<Lang>/Rules.hs

  • Duckling/<Dimension>/<Lang>/Corpus.hs

  • Duckling/Dimensions/<Lang>.hs (if not already present in Duckling/Dimensions/Common.hs)

  • Duckling/Rules/<Lang>.hs

To add a new language:

To add a new locale:

Rules have a name, a pattern and a production. Patterns are used to perform character-level matching (regexes on input) and concept-level matching (predicates on tokens). Productions are arbitrary functions that take a list of tokens and return a new token.

The corpus (resp. negative corpus) is a list of examples that should (resp. shouldn't) parse. The reference time for the corpus is Tuesday Feb 12, 2013 at 4:30am.

Duckling.Debug provides a few debugging tools:

$ stack repl --no-load
> :l Duckling.Debug
> debug (makeLocale EN $ Just US) "in two minutes" [Seal Time]
in|within|after <duration> (in two minutes)
-- regex (in)
-- <integer> <unit-of-duration> (two minutes)
-- -- integer (0..19) (two)
-- -- -- regex (two)
-- -- minute (grain) (minutes)
-- -- -- regex (minutes)
[Entity {dim = "time", body = "in two minutes", value = RVal Time (TimeValue (SimpleValue (InstantValue {vValue = 2013-02-12 04:32:00 -0200, vGrain = Second})) [SimpleValue (InstantValue {vValue = 2013-02-12 04:32:00 -0200, vGrain = Second})] Nothing), start = 0, end = 14}]

License

Duckling is BSD-licensed.