GitXplorerGitXplorer
v

funcparserlib

public
349 stars
38 forks
11 issues

Commits

List of commits on branch master.
Unverified
1a03d02f256fb887e6beb01beaf3d2cead452e76

pre-commit: Updated hooks

vvlasovskikh committed a year ago
Unverified
4bf2ed1a7284d972e16103b96ec74a8dbeba75d9

Revert "GH actions: Try to get rid of installing dependencies for tests"

vvlasovskikh committed a year ago
Unverified
0f54ab524c43a121d8ab91892a6d6351d915dc9c

GH actions: Try to get rid of installing dependencies for tests

vvlasovskikh committed a year ago
Unverified
a9ef286bcee5bc7e54e013137f677b61bcfb4d41

Updated mkdocstrings to use python instead of python-legacy

vvlasovskikh committed a year ago
Unverified
43763cc0d2d1300aabbc55eaa2c6c625ee902774

Dropped support for Python 3.7

vvlasovskikh committed a year ago
Unverified
bad6b62c9dece5b4b250759bba9449662eacfe70

Updated copyright year

vvlasovskikh committed a year ago

README

The README file for this repository.

Funcparserlib

Recursive descent parsing library for Python based on functional combinators.

PyPI PyPI - Downloads

Description

The primary focus of funcparserlib is parsing little languages or external DSLs (domain specific languages).

Parsers made with funcparserlib are pure-Python LL(*) parsers. It means that it's very easy to write parsers without thinking about lookaheads and other hardcore parsing stuff. However, recursive descent parsing is a rather slow method compared to LL(k) or LR(k) algorithms. Still, parsing with funcparserlib is at least twice faster than PyParsing, a very popular library for Python.

The source code of funcparserlib is only 1.2K lines of code, with lots of comments. Its API is fully type hinted. It features the longest parsed prefix error reporting, as well as a tiny lexer generator for token position tracking.

The idea of parser combinators used in funcparserlib comes from the Introduction to Functional Programming course. We have converted it from ML into Python.

Installation

You can install funcparserlib from PyPI:

$ pip install funcparserlib

There are no dependencies on other libraries.

Documentation

There are several examples available in the tests/ directory:

See also the changelog.

Example

Let's consider a little language of numeric expressions with a syntax similar to Python expressions. Here are some expression strings in this language:

0
1 + 2 + 3
-1 + 2 ** 32
3.1415926 * (2 + 7.18281828e-1) * 42

Here is the complete source code of the tokenizer and the parser for this language written using funcparserlib:

from typing import List, Tuple, Union
from dataclasses import dataclass

from funcparserlib.lexer import make_tokenizer, TokenSpec, Token
from funcparserlib.parser import tok, Parser, many, forward_decl, finished


@dataclass
class BinaryExpr:
    op: str
    left: "Expr"
    right: "Expr"


Expr = Union[BinaryExpr, int, float]


def tokenize(s: str) -> List[Token]:
    specs = [
        TokenSpec("whitespace", r"\s+"),
        TokenSpec("float", r"[+\-]?\d+\.\d*([Ee][+\-]?\d+)*"),
        TokenSpec("int", r"[+\-]?\d+"),
        TokenSpec("op", r"(\*\*)|[+\-*/()]"),
    ]
    tokenizer = make_tokenizer(specs)
    return [t for t in tokenizer(s) if t.type != "whitespace"]


def parse(tokens: List[Token]) -> Expr:
    int_num = tok("int") >> int
    float_num = tok("float") >> float
    number = int_num | float_num

    expr: Parser[Token, Expr] = forward_decl()
    parenthesized = -op("(") + expr + -op(")")
    primary = number | parenthesized
    power = primary + many(op("**") + primary) >> to_expr
    term = power + many((op("*") | op("/")) + power) >> to_expr
    sum = term + many((op("+") | op("-")) + term) >> to_expr
    expr.define(sum)

    document = expr + -finished

    return document.parse(tokens)


def op(name: str) -> Parser[Token, str]:
    return tok("op", name)


def to_expr(args: Tuple[Expr, List[Tuple[str, Expr]]]) -> Expr:
    first, rest = args
    result = first
    for op, expr in rest:
        result = BinaryExpr(op, result, expr)
    return result

Now, consider this numeric expression: 3.1415926 * (2 + 7.18281828e-1) * 42.

Let's tokenize() it using the tokenizer we've created with funcparserlib.lexer:

[
    Token('float', '3.1415926'),
    Token('op', '*'),
    Token('op', '('),
    Token('int', '2'),
    Token('op', '+'),
    Token('float', '7.18281828e-1'),
    Token('op', ')'),
    Token('op', '*'),
    Token('int', '42'),
]

Let's parse() these tokens into an expression tree using our parser created with funcparserlib.parser:

BinaryExpr(
    op='*',
    left=BinaryExpr(
        op='*',
        left=3.1415926,
        right=BinaryExpr(op='+', left=2, right=0.718281828),
    ),
    right=42,
)

Learn how to write this parser using funcparserlib in the Getting Started guide!

Used By

Some open-source projects that use funcparserlib as an explicit dependency:

  • Hy, a Lisp dialect that's embedded in Python
    • 4.7K stars, version ~=1.0, Python 3.8+
  • Splash, a JavaScript rendering service with HTTP API, by Scrapinghub
    • 3.9K stars, version *. Python 3 in Docker
  • graphite-beacon, a simple alerting system for Graphite metrics
    • 453 stars, version ==0.3.6, Python 2 and 3
  • blockdiag, generates block-diagram image file from spec-text file
    • 194 stars, version >= 1.0.0a0, Python 3.7+
  • kll, Keyboard Layout Language (KLL) compiler
    • 113 stars, copied source code, Python 3.5+

Next

Read the Getting Started guide to start learning funcparserlib.