Kernel.LrcSourceLRC (LR with classes)
This module computes a refinement of LR(1) states by reachable "lookahead-classes". When an LR(1) automaton is pruned to resolve conflicts, some sequence of transitions might become unreachable, while the individual transitions themselves stay reachable. For instance a sequence of states s0 -> s1 -> s2 on the stack might be impossible to reach, while s0 -> s1 and s1 -> s2 are individually reachable because there is lookahead that permit to reach both consecutively. Admittedly, this only happens for severely pruned automaton and not matter much in practice. Most of the time, LRC after minimization coincide with LR(1) states. TODO: Maybe I could drop LRC to simplify things and accept the rare over-approximations of reachable stacks? I should quantify the problem by measuring how often LRC diverge from LR(1).
Type definitions:
lr1_of: Mapping from LRC states to their underlying LR1 stateslrcs_of: Mapping from LR1 states to sets of LRC statesall_wait, all_leaf: Sets of wait and leaf LRC statesall_successors, reachable_from: Transition relationsreachable: States reachable from entrypointswait: Wait states among reachable statesentrypoints: The initial entrypoint statessuccessors, predecessors: Transition relationssome_prefix: Computing prefixes to reach states, returning both length and path.Main functions:
make: Build LRC structure from grammar and reachability analysismake_minimal: Build minimal LRC structure with state minimizationfrom_entrypoints: Compute reachability from specific entrypointscheck_deterministic: Verify the LRC automaton is deterministic (for debugging)Tricky details:
make_minimal function uses Valmari's algorithm to quotient equivalent LRC states, significantly reducing memory usage.some_prefix function computes minimal-length paths to each state, and coverage analysis. FIXME someday: Prefixes are minimal in number of symbols, not in number of terminals, which is ultimately what we care about when generating counter-examples. A short prefix can expand to a long sentence.Type-level index for LRC states
type ('g, 'n) t = {lr1_of : ('n, 'g Info.lr1 Fix.Indexing.index) Fix.Indexing.vector;lrcs_of : ('g Info.lr1, 'n Utils.Misc.indexset) Fix.Indexing.vector;all_wait : 'n Utils.Misc.indexset;all_leaf : 'n Utils.Misc.indexset;all_successors : ('n, 'n Utils.Misc.indexset) Fix.Indexing.vector;reachable_from : ('n, 'n Utils.Misc.indexset) Fix.Indexing.vector;}LRC structure mapping between LRC and LR1 states, with transition relations. lr1_of maps each LRC state to its underlying LR1 state. lrcs_of maps each LR1 state to the set of LRC states refining it. all_wait and all_leaf are the sets of wait and leaf LRC states respectively. all_successors gives, for each LRC state, the set of LRC states that can transition to it. reachable_from is the transitive closure of all_successors.
Build the LRC structure from a grammar and reachability analysis. Iterates over all LR1 states and their lookahead classes, creating an LRC state for each reachable (LR1 state, lookahead class) pair.
Convert an LRC state to a string of the form lr1_state/class_index.
Convert a set of LRC states to a string representation.
type 'n entrypoints = {reachable : 'n Utils.Misc.indexset;wait : 'n Utils.Misc.indexset;entrypoints : 'n Utils.Misc.indexset;successors : ('n, 'n Utils.Misc.indexset) Fix.Indexing.vector;predecessors : ('n, 'n Utils.Misc.indexset) Fix.Indexing.vector;some_prefix : 'n Fix.Indexing.index -> int * 'n Fix.Indexing.index list;some_prefix state returns a prefix to reach state from an entrypoint. The prefix's length and the sequence of states (excluding state) are given, starting from the end. Thus, List.rev (state :: some_prefix state) is a valid prefix.
}val from_entrypoints :
'g Info.grammar ->
('g, 'n) t ->
'n Utils.Misc.indexset ->
'n entrypointsCompute the subset of LRC states reachable from specific entrypoints, along with successor/predecessor relations and a some_prefix function for minimal-length path reconstruction.
Verify that the LRC automaton is deterministic. Emits diagnostics to stderr if a dead-end is found. Intended for debugging.
Minimal LRC states, obtained by quotienting equivalent states via Valmari's DFA minimization algorithm.
Build a minimal LRC structure by determinizing and minimizing the LRC automaton. States with identical transition behavior are merged, significantly reducing memory usage.