Module StringxSource

Sourcemodule Levenshtein : sig ... end
Sourceval center : string -> int -> string -> string

center s len pad centers s in a string of length len, padding with pad. If s is longer than len, it is returned unchanged. Padding is inserted symmetrically. pad must be non-empty or it is ignored.

This function is Unicode-aware and counts characters, not bytes. If pad is multibyte, it is repeated and truncated as needed.

Examples:

  • center "hello" 10 " " returns " hello "
  • center "abc" 7 "ใ‚" returns "ใ‚ใ‚abcใ‚ใ‚"
  • parameter s

    The string to center (UTF-8)

  • parameter len

    The total length (in Unicode characters) of the result

  • parameter pad

    The padding string (UTF-8, non-empty)

  • returns

    The centered string

Sourceval count : string -> string -> int

count str pattern counts how many Unicode characters in str match pattern.

The pattern supports:

  • character sets: e.g., "aeiou"
  • ranges: e.g., "a-k", "ใ‚-ใ‚“"
  • negation with ^: e.g., "^a-k", "^0-9"

This function is Unicode-aware and handles UTF-8 properly.

Examples:

  • count "hello" "aeiou" returns 2
  • count "abc123" "^a-z" returns 3
  • count "ใ“ใ‚“ใซใกใฏ" "ใ‚-ใ‚“" returns 5
  • parameter str

    The input string (UTF-8)

  • parameter pattern

    The character pattern (see above)

  • returns

    The number of matching characters

Sourceval delete : string -> string -> string

delete str pattern removes all Unicode characters in str that match pattern.

The pattern supports:

  • character sets: e.g., "aeiou"
  • ranges: e.g., "a-k", "ใ‚-ใ‚“"
  • negation with ^: e.g., "^a-k", "^0-9"

This function is Unicode-aware and handles UTF-8 properly.

Examples:

  • delete "hello" "aeiou" returns "hll"
  • delete "ใ“ใ‚“ใซใกใฏ" "ใ“" returns "ใ‚“ใซใกใฏ"
  • delete "abc123" "^a-z" returns "abc"
  • parameter str

    The input string (UTF-8)

  • parameter pattern

    The character pattern (see above)

  • returns

    The string with matched characters removed

Sourceval len : string -> int

len str returns the number of Unicode code points (runes) in UTF-8 string str.

This function is Unicode-aware and counts characters, not bytes.

Examples:

  • len "hello" returns 5
  • len "ใ“ใ‚“ใซใกใฏ" returns 5
  • len "๐ŸŽ๐Ÿ๐ŸŠ" returns 3
  • parameter str

    The input string (UTF-8)

  • returns

    The number of Unicode code points in str

Sourceval reverse : string -> string

reverse s reverses a UTF-8 encoded string s.

This function is Unicode-aware and reverses by code points, not bytes.

Examples:

  • reverse "hello" returns "olleh"
  • reverse "ใ“ใ‚“ใซใกใฏ" returns "ใฏใกใซใ‚“ใ“"
  • reverse "๐ŸŽ๐Ÿ๐ŸŠ" returns "๐ŸŠ๐Ÿ๐ŸŽ"
  • parameter s

    The input string (UTF-8)

  • returns

    The reversed string

Sourceval contains : string -> string -> bool

contains s substr reports whether substr is within s.

Returns true if substr is the empty string, or if substr occurs anywhere in s. Returns false otherwise.

This function is Unicode-agnostic and operates on bytes, not code points.

Examples:

  • contains "seafood" "foo" returns true
  • contains "seafood" "bar" returns false
  • contains "seafood" "" returns true
  • contains "" "" returns true
  • parameter s

    The input string

  • parameter substr

    The substring to search for

  • returns

    true if substr is found in s, false otherwise

Sourceval contains_any : string -> string -> bool

contains_any s chars reports whether any Unicode code points in chars are within s.

Returns false if chars is empty. Unicode-aware and compares by code points.

Examples:

  • contains_any "team" "i" returns false
  • contains_any "fail" "ui" returns true
  • contains_any "ure" "ui" returns true
  • contains_any "failure" "ui" returns true
  • contains_any "foo" "" returns false
  • contains_any "" "" returns false
  • parameter s

    The input string (UTF-8)

  • parameter chars

    The set of Unicode code points to search for (UTF-8)

  • returns

    true if any code point in chars is found in s, false otherwise

Sourceval has_prefix : string -> string -> bool

has_prefix s prefix reports whether the string s begins with prefix.

Returns true if prefix is the empty string, or if s starts with prefix. Returns false otherwise.

This function is Unicode-agnostic and operates on bytes, not code points.

Examples:

  • has_prefix "Gopher" "Go" returns true
  • has_prefix "Gopher" "C" returns false
  • has_prefix "Gopher" "" returns true
  • parameter s

    The input string

  • parameter prefix

    The prefix to test

  • returns

    true if s starts with prefix, false otherwise

Sourceval has_suffix : string -> string -> bool

has_suffix s suffix reports whether the string s ends with suffix.

Returns true if suffix is the empty string, or if s ends with suffix. Returns false otherwise.

This function is Unicode-agnostic and operates on bytes, not code points.

Examples:

  • has_suffix "Amigo" "go" returns true
  • has_suffix "Amigo" "O" returns false
  • has_suffix "Amigo" "Ami" returns false
  • has_suffix "Amigo" "" returns true
  • parameter s

    The input string

  • parameter suffix

    The suffix to test

  • returns

    true if s ends with suffix, false otherwise

Sourceval count_substring : string -> string -> int

count_substring s substr counts the number of non-overlapping instances of substr in s.

If substr is the empty string, returns 1 + the number of Unicode code points in s.

This function is Unicode-agnostic and operates on bytes, not code points.

Examples:

  • count_substring "cheese" "e" returns 3
  • count_substring "five" "" returns 5
  • count_substring "banana" "na" returns 2
  • count_substring "aaaaa" "aa" returns 2
  • count_substring "" "" returns 1
  • count_substring "" "a" returns 0
  • parameter s

    The input string

  • parameter substr

    The substring to count

  • returns

    The number of non-overlapping instances of substr in s

Sourceval equal_fold : string -> string -> bool

equal_fold s t reports whether s and t, interpreted as UTF-8 strings, are equal under simple Unicode case-folding (ASCII only).

This is a simple case-insensitive comparison for ASCII letters only. (It does not perform full Unicode case folding.)

Examples:

  • equal_fold "Go" "go" returns true
  • equal_fold "AB" "ab" returns true
  • equal_fold "รŸ" "ss" returns false
  • parameter s

    The first string (UTF-8)

  • parameter t

    The second string (UTF-8)

  • returns

    true if s and t are equal under simple case folding, false otherwise

Sourceval fields : string -> string list

fields s splits the string s around each instance of one or more consecutive Unicode whitespace characters, returning a list of substrings of s or an empty list if s contains only whitespace.

Whitespace is defined by Unicode (see is_space).

Examples:

  • fields " foo bar baz " returns ["foo"; "bar"; "baz"]
  • fields " " returns []
  • fields "a\tb\nc" returns ["a"; "b"; "c"]
  • parameter s

    The input string (UTF-8)

  • returns

    List of non-whitespace substrings of s

Sourceval fields_func : string -> (Uchar.t -> bool) -> string list

fields_func s f splits the string s at each run of Unicode code points c satisfying f c, returning a list of substrings of s or an empty list if all code points in s satisfy f or s is empty.

Examples:

  • fields_func " foo1;bar2,baz3..." (fun c -> not (is_letter c || is_number c)) returns ["foo1"; "bar2"; "baz3"]
  • parameter s

    The input string (UTF-8)

  • parameter f

    The predicate function on Unicode code points

  • returns

    List of non-separator substrings of s

Sourceval index : string -> string -> int

index s substr returns the index of the first instance of substr in s, or -1 if substr is not present.

The index is a byte offset (not code point index).

Examples:

  • index "chicken" "ken" returns 4
  • index "chicken" "dmr" returns -1
  • index "abc" "" returns 0
  • index "" "" returns 0
  • index "" "a" returns -1
  • parameter s

    The input string

  • parameter substr

    The substring to search for

  • returns

    The byte index of the first occurrence, or -1 if not found

Sourceval repeat : string -> int -> string

repeat s count returns a new string consisting of count copies of s.

Raises Invalid_argument if count is negative.

Examples:

  • repeat "na" 2 returns "nana"
  • repeat "๐ŸŽ" 3 returns "๐ŸŽ๐ŸŽ๐ŸŽ"
  • repeat "" 5 returns ""
  • repeat "a" 0 returns ""
  • repeat "abc" (-1) raises Invalid_argument
  • parameter s

    The string to repeat

  • parameter count

    The number of times to repeat s

  • returns

    The repeated string

Sourceval join : string list -> string -> string

join elems sep concatenates the elements of elems, inserting sep between each element.

Returns the empty string if elems is empty.

Examples:

  • join ["foo"; "bar"; "baz"] ", " returns "foo, bar, baz"
  • join [] ", " returns ""
  • join ["a"] ", " returns "a"
  • parameter elems

    The list of strings to join

  • parameter sep

    The separator string

  • returns

    The joined string