GitXplorerGitXplorer
m

pcre-ocaml

public
32 stars
9 forks
0 issues

Commits

List of commits on branch master.
Unverified
3cac0ce4d8c46f6e9962d9dfb37dbafddd705140

ugh, clean out ppx_deriving

cchetmurthy committed 11 days ago
Unverified
12793690656d750d681c96bc581a78957f179625

remove ppx dependency

cchetmurthy committed 11 days ago
Unverified
26a42528b2ee1cb043f898716a82b5692b4ed8ab

meh

cchetmurthy committed 11 days ago
Unverified
0879a1b2b85b06a9a33234d682d5ead751acdec1

start adding tests

cchetmurthy committed 11 days ago
Unverified
817fb86f34ffd48274b9bdd950f01cc5a1fa6e76

Updated changelog

mmmottl committed 12 days ago
Unverified
d02a0dadbf6ca6f1303901c654e6c46131923f8c

Support for OCaml 4.08

mmmottl committed 12 days ago

README

The README file for this repository.

PCRE-OCaml - Perl Compatibility Regular Expressions for OCaml

This OCaml library interfaces with the C library PCRE, providing Perl-compatible regular expressions for string matching.

Features

PCRE-OCaml offers:

  • Pattern searching
  • Subpattern extraction
  • String splitting by patterns
  • Pattern substitution

Reasons to choose PCRE-OCaml:

  • The PCRE library by Philip Hazel is mature and stable, implementing nearly all Perl regular expression features. High-level OCaml functions (split, replace, etc.) are compatible with Perl functions, as much as OCaml allows. Some developers find Perl-style regex syntax more intuitive and powerful than the Emacs-style regex used in OCaml's Str module.

  • PCRE-OCaml is reentrant and thread-safe, unlike the Str module. This reentrancy offers convenience, eliminating concerns about library state.

  • High-level replacement and substitution functions in OCaml are faster than those in the Str module. When compiled to native code, they can even outperform Perl's C-based functions.

  • Returned data is unique, allowing safe destructive updates without side effects.

  • The library interface uses labels and default arguments for enhanced programming comfort.

Usage

Please run:

odig odoc pcre2

Or:

dune build @doc

Consult the API for details.

Functions support two flag types:

  1. Convenience flags: Readable and concise, translated internally on each call. Example:

    let rex = Pcre.regexp ~flags:[`ANCHORED; `CASELESS] "some pattern" in
    (* ... *)

    These are easy to use but may incur overhead in loops. For performance optimization, consider the next approach.

  2. Internal flags: Predefined and translated from convenience flags for optimal loop performance. Example:

    let iflags = Pcre.cflags [`ANCHORED; `CASELESS] in
    for i = 1 to 1000 do
      let rex = Pcre.regexp ~iflags "some pattern constructed at runtime" in
      (* ... *)
    done

    Translating flags outside loops saves cycles. Avoid creating regex in loops:

    for i = 1 to 1000 do
      let chunks = Pcre.split ~pat:"[ \t]+" "foo bar" in
      (* ... *)
    done

    Instead, predefine the regex:

    let rex = Pcre.regexp "[ \t]+" in
    for i = 1 to 1000 do
      let chunks = Pcre.split ~rex "foo bar" in
      (* ... *)
    done

Functions use optional arguments with intuitive defaults. For instance, Pcre.split defaults to whitespace as the pattern. The examples directory contains applications demonstrating PCRE-OCaml's functionality.

Restartable (Partial) Pattern Matching

PCRE includes a DFA match function for restarting partial matches with new input, exposed via pcre_dfa_exec. While not suitable for extracting submatches or splitting strings, it's useful for streaming and search tasks.

Example of a partial match restarted:

utop # open Pcre;;
utop # let rex = regexp "12+3";;
val rex : regexp = <abstr>
utop # let workspace = Array.make 40 0;;
val workspace : int array =
  [| ... |]
utop # pcre_dfa_exec ~rex ~flags:[`PARTIAL] ~workspace "12222";;
Exception: Pcre.Error Partial.
utop # pcre_dfa_exec ~rex ~flags:[`PARTIAL; `DFA_RESTART] ~workspace "2222222";;
Exception: Pcre.Error Partial.
utop # pcre_dfa_exec ~rex ~flags:[`PARTIAL; `DFA_RESTART] ~workspace "2222222";;
Exception: Pcre.Error Partial.
utop # pcre_dfa_exec ~rex ~flags:[`PARTIAL; `DFA_RESTART] ~workspace "223xxxx";;
- : int array = [|0; 3; 0|]

Refer to the pcre_dfa_exec documentation and the dfa_restart example for more information.

Contact Information and Contributing

Submit bug reports, feature requests, and contributions via the GitHub issue tracker.

For the latest information, visit: https://mmottl.github.io/pcre-ocaml