ParseffSourceParser combinator library with an imperative-style API.
Parseff is a parser combinator library that uses algebraic effects to allow an imperative-style API for parsers and ensures
let number () =
let c = satisfy (fun c -> c >= '0' && c <= '9') ~label:"digit" in
Char.code c - Char.code '0'
let ip_address () =
let a = number () in
let _ = Parseff.consume "." in
let b = number () in
let _ = Parseff.consume "." in
let c = number () in
let _ = Parseff.consume "." in
let d = number () in
Parseff.end_of_input ();
(a, b, c, d)
match Parseff.parse "192.168.1.1" ip_address with
| Ok result -> Printf.printf "Parsed: %d.%d.%d.%d\n" (fst (fst (fst result))) ...
| Error { pos; error = `Expected msg } -> Printf.printf "Error at %d: %s\n" pos msg
| Error _ -> Printf.printf "Other error\n"A zero-copy slice of the input string. Use span_to_string to materialize when needed.
span_to_string s extracts the string from a span. Only call this when you actually need the string value.
Parse result with support for custom error types.
The result type has two type parameters:
'a is the type of the parsed value'e is the type of errorsBuilt-in errors are polymorphic variants:
`Expected of string — a specific token or pattern was expected but the input contained something else`Unexpected_end_of_input — input ended before the parser could match`Depth_limit_exceeded of string — recursive nesting exceeded max_depth (see rec_)User errors raised via error are also returned as Error.
Non-fatal diagnostic emitted during parsing.
type ('e, 'd) error_with_diagnostics = {pos : int;error : 'e;diagnostics : 'd diagnostic list;}type ('a, 'e, 'd) result_with_diagnostics =
('a * 'd diagnostic list, ('e, 'd) error_with_diagnostics) resultParse outcome with diagnostics in both success and failure cases.
val parse :
?max_depth:int ->
string ->
(unit -> 'a) ->
('a,
[> `Expected of string
| `Unexpected_end_of_input
| `Depth_limit_exceeded of string ])
resultparse ?max_depth input parser runs parser on input string.
max_depth limits the nesting depth for parsers that use rec_ to mark recursive entry points. Defaults to 128. When exceeded, parsing fails with `Depth_limit_exceeded instead of risking a stack overflow.
This function does not require consuming the full input; call end_of_input in your parser when you want full-consumption behavior.
Returns Ok result on success, or Error { pos; error } on failure. Errors are:
`Expected msg — the input contained something unexpected`Unexpected_end_of_input — the input ended before the parser could match`Depth_limit_exceeded msg — recursive nesting exceeded max_deptherrorExample:
match parse "hello" (fun () -> consume "hello") with
| Ok s ->
Printf.printf "Matched %S\n" s
| Error { pos; error = `Expected msg } ->
Printf.printf "Failed at %d: %s\n" pos msg
| Error { error = `Unexpected_end_of_input; _ } ->
Printf.printf "Input ended too early\n"
| Error _ ->
Printf.printf "Other error\n"val parse_until_end :
?max_depth:int ->
string ->
(unit -> 'a) ->
('a,
[> `Expected of string
| `Unexpected_end_of_input
| `Depth_limit_exceeded of string ],
'd)
result_with_diagnosticsconsume s matches the exact literal string s.
Example:
let parser () =
consume "hello";
consume " ";
consume "world"satisfy fun ~label matches the next character if fun returns true for it. If the character doesn't match or input is empty, fails with label in the error message.
Example:
let vowel () = satisfy (fun c -> String.contains "aeiou" c) ~label:"vowel"char c matches the exact character c.
Example:
let comma () = char ','match_regex re matches a regular expression. The regex must be compiled with Re.compile.
Example:
let identifier () =
let re = Re.compile (Re.Posix.re "[a-zA-Z_][a-zA-Z0-9_]*") in
match_regex retake_while fun reads characters one by one as long as fun returns true for each one. Returns the matched string (may be empty if the very first character doesn't match). Much faster than regex for simple character classes.
Example:
let digits () = take_while (fun c -> c >= '0' && c <= '9')take_while1 fun ~label like take_while but requires at least one character to match. Fails with label in the error message if no characters match.
Example:
let digits1 () =
take_while1 (fun c -> c >= '0' && c <= '9') ~label:"digit"skip_while fun advances past characters as long as fun returns true. Like take_while but doesn't build a string — use this when you only need to move past characters. Always succeeds (skips nothing if the first character doesn't match).
Example:
let skip_spaces () = skip_while (fun c -> c = ' ')skip_while_then_char fun c skips characters where fun returns true, then matches the exact character c. More efficient than calling skip_while followed by char separately.
sep_by_take is_whitespace separator is_value_char parses a list of values separated by separator. Whitespace (characters where is_whitespace returns true) is skipped around each separator. Each value consists of characters where is_value_char returns true. Returns a list of matched strings. Runs entirely in a single operation for maximum efficiency.
take_while_span fun like take_while but returns a zero-copy span instead of allocating a string. No memory allocation until you call span_to_string.
sep_by_take_span is_whitespace separator is_value_char like sep_by_take but returns zero-copy spans instead of strings. No String.sub allocations per element.
fused_sep_take is_whitespace separator is_value_char skips whitespace, matches separator, skips whitespace again, then reads one or more characters where is_value_char returns true. All steps run in a single operation. Returns the taken string. Much more efficient than calling each step separately when parsing separated values.
fail msg aborts parsing with an error message.
Example:
let validate_range n =
if n >= 0 && n <= 255 then
n
else
fail "number out of range"error e aborts parsing with a user-defined error value.
The error is returned as Error { pos; error = e }. Use polymorphic variants for rich error reporting:
let number () =
let n = parse_int () in
if n > 255 then error (`Out_of_range n)
else if n < 0 then error (`Negative n)
else n
match run input number with
| Ok n -> Printf.printf "Got %d\n" n
| Error { error = `Out_of_range n; _ } ->
Printf.printf "%d is too large\n" n
| Error { error = `Negative n; _ } ->
Printf.printf "%d is negative\n" n
| Error _ -> Printf.printf "Parse error\n"Note: User errors from error pass through expect and one_of_labeled without being caught or relabeled. However, backtracking combinators (or_, many, one_of, optional, look_ahead) will catch and absorb user errors just like any other parse failure. If you need an error to escape backtracking, raise an OCaml exception instead.
warn diagnostic records a non-fatal diagnostic at the current position and continues parsing.
warn_at ~pos diagnostic records a non-fatal diagnostic at pos and continues parsing.
position () returns the current parser offset in bytes from the start of the input.
end_of_input () succeeds only if no input remains. Use this to ensure the entire input has been consumed.
Example:
let complete_parser () =
let result = some_parser () in
end_of_input ();
resultor_ is the alternation combinator. Tries the left parser; if it fails, backtracks and tries the right parser.
Example:
let bool_parser () =
or_
(fun () ->
consume "true";
true
)
(fun () ->
consume "false";
false
)
()look_ahead parser runs parser without consuming any input. If parser succeeds, the position stays where it was before — useful for peeking at what comes next. Fails if parser fails.
Example:
let check_next_is_digit () = look_ahead digit
(* position hasn't moved *)rec_ parser marks a recursive entry point for depth tracking. Wrap the body of recursive parsers with rec_ so that parse can enforce max_depth and fail cleanly instead of overflowing the stack.
Example:
let rec json () = Parseff.rec_ (fun () ->
Parseff.one_of [ array_parser; null_parser; ... ] ()
)
and array_parser () =
let _ = Parseff.consume "\[" in
let elements = ... json () ... in
...expect description parser runs parser and, if it fails with a parse error, replaces the error message with description. Reads naturally: "expect a dot separator".
Only parse errors (from fail, consume, satisfy, etc.) are relabeled. User errors raised via error propagate unchanged — this lets you use expect around parsers that perform validation without losing the structured error:
let octet () =
expect "an octet (0-255)" (fun () ->
let n = number () in
if n > 255 then
error (`Out_of_range n)
else
n
)
(* A non-digit input gives: "expected an octet (0-255)" *)
(* Input "300" gives: `Out_of_range 300 — not swallowed *)Example:
let dot () = expect "a dot separator" (fun () -> char '.')
let digit_val () = expect "a digit (0-9)" digitone_of parsers tries each parser in order until one succeeds.
Example:
let keyword () =
one_of
[
(fun () -> consume "if");
(fun () -> consume "else");
(fun () -> consume "while");
]
()one_of_labeled labeled_parsers tries each parser in order. On failure, reports all labels in the error message.
Like expect, only parse errors are relabeled. User errors raised via error inside any branch propagate unchanged.
Example:
let literal () =
one_of_labeled
[
("number", number_parser);
("string", string_parser);
("boolean", bool_parser);
]
()
(* On failure: "expected one of: number, string, boolean" *)many parser applies parser repeatedly until it fails. Returns a list of all successful results. Always succeeds (returns [] if parser fails immediately).
Example:
let digits () = many digit () (* parses "123" -> [1; 2; 3] *)many1 parser like many but requires at least one successful match. Fails if parser doesn't succeed at least once.
Example:
let non_empty_digits () = many1 digit ()sep_by element separator parses zero or more occurrences of element with separator between each pair. Returns a list of the parsed elements.
Example:
let csv_line () =
sep_by
(fun () -> match_regex (Re.compile (Re.Posix.re "[^,]+")))
(fun () -> char ',')
()sep_by1 element separator like sep_by but requires at least one element to match.
between open_ close_ parser parses open_, then parser, then close_, and returns the value produced by parser.
end_by element separator parses zero or more elements, each followed by separator.
end_by1 element separator like end_by but requires at least one element.
chainl element op default parses zero or more element values separated by op, combining them left-associatively. Returns default if there are zero element values.
chainl1 element op parses one or more element values separated by op, combining them left-associatively.
chainr element op default parses zero or more element values separated by op, combining them right-associatively. Returns default if there are zero element values.
chainr1 element op parses one or more element values separated by op, combining them right-associatively.
optional parser tries to apply parser. Returns Some result if it succeeds, or None if it fails (without consuming input).
Example:
let optional_sign () =
optional (fun () -> or_ (fun () -> char '-') (fun () -> char '+') ()) ()count n parser applies parser exactly n times. Fails if parser doesn't succeed n times.
Example:
let three_digits () = count 3 digit ()digit () parses a decimal digit (0-9) and returns its integer value.
Example:
let d = digit () in (* parses "7" -> 7 *)
...letter () parses an ASCII letter (a-z or A-Z).
is_whitespace c returns true for whitespace characters (space, tab, newline, CR).
whitespace () parses zero or more whitespace characters (space, tab, newline, carriage return). Uses fast character scanning (not regex).
whitespace1 () parses one or more whitespace characters.
skip_whitespace () skips zero or more whitespace characters (returns unit). More efficient than whitespace when you don't need the matched string.
alphanum () parses an alphanumeric character (letter or digit).
any_char () parses any character.
Input sources for incremental parsing. A source wraps a readable byte stream — a channel, file descriptor, or custom reader — behind a uniform interface. The parser pulls data on demand through the effect handler; existing parser code works unchanged.
val parse_source :
?max_depth:int ->
Source.t ->
(unit -> 'a) ->
('a,
[> `Expected of string
| `Unexpected_end_of_input
| `Depth_limit_exceeded of string ])
resultparse_source ?max_depth source parser runs parser pulling input from source on demand. Behaves identically to parse but the input does not need to be fully available up front.
The same parsers work with both parse and parse_source — no changes required.
Example:
let ic = open_in "data.json" in
let source = Source.of_channel ic in
let result = parse_source source json in
close_in ic;
resultval parse_source_until_end :
?max_depth:int ->
Source.t ->
(unit -> 'a) ->
('a,
[> `Expected of string
| `Unexpected_end_of_input
| `Depth_limit_exceeded of string ],
'd)
result_with_diagnosticsparse_source_until_end ?max_depth source parser is the streaming equivalent of parse_until_end. It enforces full consumption and returns diagnostics on both success and failure.
Example:
let ic = open_in "data.json" in
let source = Source.of_channel ic in
let outcome = parse_source_until_end source json in
close_in ic;
outcome