Module Kernel.LrcSource

LRC (LR with classes)

This module computes a refinement of LR(1) states by reachable "lookahead-classes". When an LR(1) automaton is pruned to resolve conflicts, some sequence of transitions might become unreachable, while the individual transitions themselves stay reachable. For instance a sequence of states s0 -> s1 -> s2 on the stack might be impossible to reach, while s0 -> s1 and s1 -> s2 are individually reachable because there is lookahead that permit to reach both consecutively. Admittedly, this only happens for severely pruned automaton and not matter much in practice. Most of the time, LRC after minimization coincide with LR(1) states. TODO: Maybe I could drop LRC to simplify things and accept the rare over-approximations of reachable stacks? I should quantify the problem by measuring how often LRC diverge from LR(1).

Type definitions:

Main functions:

Tricky details:

Sourcetype 'g n

Type-level index for LRC states

Sourcetype ('g, 'n) t = {
  1. lr1_of : ('n, 'g Info.lr1 Fix.Indexing.index) Fix.Indexing.vector;
  2. lrcs_of : ('g Info.lr1, 'n Utils.Misc.indexset) Fix.Indexing.vector;
  3. all_wait : 'n Utils.Misc.indexset;
  4. all_leaf : 'n Utils.Misc.indexset;
  5. all_successors : ('n, 'n Utils.Misc.indexset) Fix.Indexing.vector;
  6. reachable_from : ('n, 'n Utils.Misc.indexset) Fix.Indexing.vector;
}

LRC structure mapping between LRC and LR1 states, with transition relations. lr1_of maps each LRC state to its underlying LR1 state. lrcs_of maps each LR1 state to the set of LRC states refining it. all_wait and all_leaf are the sets of wait and leaf LRC states respectively. all_successors gives, for each LRC state, the set of LRC states that can transition to it. reachable_from is the transitive closure of all_successors.

Sourceval make : 'g Info.grammar -> 'g Reachability.t -> ('g, 'g n) t

Build the LRC structure from a grammar and reachability analysis. Iterates over all LR1 states and their lookahead classes, creating an LRC state for each reachable (LR1 state, lookahead class) pair.

Sourceval to_string : 'g Info.grammar -> ('g, 'g n) t -> 'g n Fix.Indexing.index -> string

Convert an LRC state to a string of the form lr1_state/class_index.

Sourceval set_to_string : 'g Info.grammar -> ('g, 'g n) t -> 'g n Utils.Misc.indexset -> string

Convert a set of LRC states to a string representation.

Sourcetype 'n entrypoints = {
  1. reachable : 'n Utils.Misc.indexset;
  2. wait : 'n Utils.Misc.indexset;
  3. entrypoints : 'n Utils.Misc.indexset;
  4. successors : ('n, 'n Utils.Misc.indexset) Fix.Indexing.vector;
  5. predecessors : ('n, 'n Utils.Misc.indexset) Fix.Indexing.vector;
  6. some_prefix : 'n Fix.Indexing.index -> int * 'n Fix.Indexing.index list;
    (*

    some_prefix state returns a prefix to reach state from an entrypoint. The prefix's length and the sequence of states (excluding state) are given, starting from the end. Thus, List.rev (state :: some_prefix state) is a valid prefix.

    *)
}
Sourceval from_entrypoints : 'g Info.grammar -> ('g, 'n) t -> 'n Utils.Misc.indexset -> 'n entrypoints

Compute the subset of LRC states reachable from specific entrypoints, along with successor/predecessor relations and a some_prefix function for minimal-length path reconstruction.

Sourceval check_deterministic : 'g Info.grammar -> 'g Reachability.t -> unit

Verify that the LRC automaton is deterministic. Emits diagnostics to stderr if a dead-end is found. Intended for debugging.

Sourcetype 'g mlrc

Minimal LRC states, obtained by quotienting equivalent states via Valmari's DFA minimization algorithm.

Sourceval make_minimal : 'g Info.grammar -> 'g Reachability.t -> ('g, 'g mlrc) t

Build a minimal LRC structure by determinizing and minimizing the LRC automaton. States with identical transition behavior are merged, significantly reducing memory usage.