stem.snowball
Snowball
stem.cli
stem.stopwords
stem.tokenizer
type encoding =
| ISO_8859_1
| ISO_8859_2
| KOI8_R
| UTF_8
Type of encodings.
val pp_encoding : Format.formatter -> encoding -> unit
type t
Type of stemmers.
module Language : sig ... end
val languages : Language.t list
Languages available for stemming.
val porter : Language.t
val create : ?encoding:encoding -> Language.t -> t
create ?encoding language creates a stemmer which can be used to stem words via stem.
create ?encoding language
stem
NOTE: it's important to release (via remove a t when you are done about stemming.
remove
t
val remove : t -> unit
remove stemmer destroys the underlying structure used to stem words.
remove stemmer
val stem : t -> string -> string
stem stemmer word stems the given word with the given stemmer (which corresponds to the language we use).
stem stemmer word
stemmer