tokenizer

Native tokenizer module.

This module provides the native tokenizer backend used by core.tokenizer when native tokenization is enabled.

tokenizer.pattern_stats

Per-pattern native tokenizer compilation and runtime counters.

close_code

(field) close_code: string?

Closer pattern code used by the native tokenizer.

close_fast_kind

(field) close_fast_kind: integer

Native fast-path kind used by the closer pattern.

code

(field) code: string?

Opener pattern code used by the native tokenizer.

fallback_match_calls

(field) fallback_match_calls: integer

Number of fallback matcher calls for this pattern.

fast_kind

(field) fast_kind: integer

Native fast-path kind used by the opener pattern.

pattern

(field) pattern: string?

Display pattern from the syntax definition.

skipped_by_starter

(field) skipped_by_starter: integer

Number of matches skipped by starter filtering.

unknown_starter

(field) unknown_starter: boolean

True when the opener pattern has unknown start bytes.

tokenizer.resume

Resume information returned by tokenizer.tokenize() when tokenization does not finish within the current frame budget.

i

(field) i: integer

Next character position to continue tokenizing from.

res

(field) res: string[]

Accumulated tokens in the form \{ type, text, ... \}.

state

(field) state: string

Tokenizer state that should be reused on resume.

tokenizer.syntax_stats

Native tokenizer compilation and runtime counters for a syntax.

compiled_patterns

(field) compiled_patterns: integer

Number of patterns with a native fast path.

fallback_match_calls

(field) fallback_match_calls: integer

Number of fallback matcher calls for this syntax.

fallback_patterns

(field) fallback_patterns: integer

Number of patterns using the fallback matcher.

has_unknown_starters

(field) has_unknown_starters: boolean

True when any pattern has unknown start bytes.

normal_run_skips

(field) normal_run_skips: integer

Number of normal text runs skipped by starter filtering.

pattern_stats

(field) pattern_stats: tokenizer.pattern_stats[]

Per-pattern counters.

patterns

(field) patterns: integer

Number of patterns imported from the syntax.

skipped_by_starter

(field) skipped_by_starter: integer

Number of matches skipped by starter filtering.

extract_subsyntaxes

function tokenizer.extract_subsyntaxes(base_syntax: core.syntax.syntax, state: string)
  -> syntaxes: core.syntax.syntax[]

Return the list of syntaxes active for the given tokenizer state.

@param base_syntax — The base syntax of the document.

@param state — Tokenizer state previously returned by tokenize.

@return syntaxes — Array of syntaxes starting from the innermost one.

get_syntax_stats

function tokenizer.get_syntax_stats(syntax: core.syntax.syntax)
  -> stats: tokenizer.syntax_stats

Return native tokenizer compilation and runtime counters for a syntax.

@param syntax — The syntax to inspect.

@return stats — Native compilation and runtime counters.

tokenize

function tokenizer.tokenize(incoming_syntax: core.syntax.syntax, text: string, state?: string, resume?: tokenizer.resume)
  -> tokens: string[]
  2. state: string
  3. resume: (tokenizer.resume)?

Tokenize a single line of text using the given syntax and state.

Returns tokens in the form \{ type, text, ... \}. If the tokenizer runs out of time, it returns a third value containing the resume data to continue tokenizing the same line later.

@param incoming_syntax — The syntax to tokenize against.

@param text — The line text to tokenize.

@param state — Current tokenizer state.

@param resume — Resume data from a previous incomplete call.

@return tokens — Tokens in the form \{ type, text, ... \}.

@return state — Updated tokenizer state.

@return resume — Resume data when tokenization yields before finishing.

tokenizer.pattern_stats​

close_code​

close_fast_kind​

code​

fallback_match_calls​

fast_kind​

pattern​

skipped_by_starter​

unknown_starter​

tokenizer.resume​

i​

res​

state​

tokenizer.syntax_stats​

compiled_patterns​

fallback_match_calls​

fallback_patterns​

has_unknown_starters​

normal_run_skips​

pattern_stats​

patterns​

skipped_by_starter​

extract_subsyntaxes​

get_syntax_stats​

tokenize​

tokenizer.pattern_stats

close_code

close_fast_kind

code

fallback_match_calls

fast_kind

pattern

skipped_by_starter

unknown_starter

tokenizer.resume

i

res

state

tokenizer.syntax_stats

compiled_patterns

fallback_match_calls

fallback_patterns

has_unknown_starters

normal_run_skips

pattern_stats

patterns

skipped_by_starter

extract_subsyntaxes

get_syntax_stats

tokenize