Skip to main content

Creating Syntaxes

Syntax highlighting plugins for Pragtical are Lua files. These define some patterns or regular expressions that match different parts of a given language, assigning token types to each match. These different token types are then given different colors by your chosen color scheme.

What syntax token types are supported?

The supported syntax token types, defined by pragtical/core/style.lua, are:

  • normal
  • symbol
  • comment
  • keyword
  • keyword2
  • number
  • literal
  • string
  • operator
  • function

In your syntax highlighting plugin, you write patterns to match parts of the language syntax, assigning these token types to matches. You don't have to use them all - just use as many as you need for your language.

Let's walk through an example syntax definition and see how this works.

Example syntax: ssh config files

This is a small, simple example of a syntax definition. It's intended to highlight SSH Config files and looks like this:

-- mod-version:3
local syntax = require "core.syntax"

syntax.add {
files = { "sshd?/?_?config$" },
comment = '#',
patterns = {
{ pattern = "#.*\n", type = "comment" },
{ pattern = "%d+", type = "number" },
{ pattern = "[%a_][%w_]*", type = "symbol" },
{ pattern = "@", type = "operator" },
},
symbols = {
-- ssh config
["Host"] = "function",
["ProxyCommand"] = "function",

["HostName"] = "keyword",
["IdentityFile"] = "keyword",
...

-- sshd config
["Subsystem"] = "keyword2",

-- Literals
["yes"] = "literal",
["no"] = "literal",
["any"] = "literal",
["ask"] = "literal",
},
}

Let's take each section in turn and see how it works.

The first line is a Lua comment and tells Pragtical which version this plugin requires. The second imports the core.syntax module to allow us to declare a new syntax:

-- mod-version:3
local syntax = require "core.syntax"

We then add a syntax definition to Pragtical with syntax.add {...}. The contents of this definition are covered next.

Files

The files property tells Pragtical which files this syntax should be used for. This is a Lua pattern that matches against the full path of the current file. For example, to match against Markdown files (.md or a .markdown files), you could do this:

files = { "%.md$", "%.markdown$" },

In our original example, we match against the end of the path rather than the extension, because SSH config files don't have extensions, and we don't want to match all config files. We expect the path for SSH config files to look something like one of these:

  • ~/.ssh/config
  • /etc/ssh/ssh_config
  • /etc/ssh/sshd_config

This pattern matches paths that look like that:

files = { "sshd?/?_?config$" },

Comment

The comment property is used to tell Pragtical what to insert in order to create a comment. It is not a part of syntax definition. You can also use block_comment to tell Pragtical how to create multiline / block comments.

Patterns

A given piece of text can only match one pattern. Once Pragtical decides that a piece of text matches a pattern, it will assign that token type to it and move on. Patterns are tested in the order that they are written in the syntax definition, so the first match will win.

Patterns are based on Lua patterns or PCRE2.

You may find detailed information on Lua patterns in the [Lua Reference Manual]. For PCRE, there are various [regex tester websites] that provide documentation.

Lua patterns can be used by specifying pattern when defining a pattern, while PCRE can be used by specifying regex when defining a pattern.

Each pattern takes one of the following forms:

Simple Pattern

{ pattern = "#.*\n",        type = "comment" },

When pattern is a string, Pragtical will test the input against the pattern. If the input matches, Pragtical will assign the given token type to the input.

In this case, any line starting with # will be assigned the type comment.

Start & End Pattern

{ pattern = { "%[", "%]" }, type = "keyword" },

When pattern is a table with 2 elements, Pragtical will use them to test for the start and the end of a range. Everything between the start and the end will be assigned the given token type.

In this case, everything between [ and ] will be assigned the type keyword.

However, it does not account for escape sequences. Inputs such as [\]] will be interpreted wrongly as [\] and ].

Start & End Pattern with Escape

{ pattern = { '"', '"', '\\' }, type = "string" },

When pattern is a table with 3 elements, Pragtical will use the first two to test for the start and the end of a range. The last element is used to denote an "escape sequence". If the text matches the 3rd element followed by the 2nd element, it will not be interpreted as the end of a range.

In this case, everything between " and " will be assigned the type string. A string can have escape sequences prefixed with \.

Given the input "\"Hello John\"", the entire input will be assigned the type string.

Symbols

This is not related to the symbol token type.

The symbols section allows you to assign token types to particular keywords or strings - usually reserved words in the language you are highlighting. The token type in this section always take precedence over token types declared in patterns.

For example this highlights Host using the function token type, HostName as a keyword, yes, no, any and ask as a literal:

["Host"]                         = "function",
["HostName"] = "keyword",

["yes"] = "literal",
["no"] = "literal",
["any"] = "literal",
["ask"] = "literal",

Tips: double-check your patterns!

There are a few common mistakes that can be made when using the symbols table in conjunction with patterns.

Case 1: Spaces between two symbols tokens

Let's have an example:

{ pattern = "[%a_][%w_]+%s+()[%a_][%w_]+", type = { "keyword2", "symbol" } }

Let's explain the pattern a bit (omitting the empty parentheses):

[%a_] = any alphabet and underscore
[%w_] = any alphabet, number and underscore
%s = any whitespace character

WORD =
[%a_] followed by (1 or more [%w_])

pattern =
WORD followed by (one or more %s) followed by WORD

Afterwards, you add an entry ["my"] = "literal" in the symbols table. You test the syntax with my function and found that "my" isn't highlighted as literal. Why did that happen?

symbols table requires an exact match. If you look carefully, the empty parentheses (()) is placed after the space! This tells Pragtical that WORD followed by (one or more %s) is a token, which will match my (note the space in the match).

The fix is to add a normal token for the whitespace between the two tokens:

{ pattern = "[%a_][%w_]+()%s+()[%a_][%w_]+", type = { "keyword2", "normal", "symbol" } }
Case 2: Patterns & symbols tokens

One might assume that Pragtical magically matches text against the symbols table. This is not the case.

In some languages, people may add a generic pattern to delegate the matching to the symbols table.

{ pattern = "[%a_][%w_]*", "symbol" }

However, the symbols table may look like this:

symbols = {
["my-symbol"] = "function",
["..something_else"] = "literal"
}

"my-symbol contains a dash (-) and "..something_else" contains 2 dots (.). None of the characters are matched by [%a_][%w_]*!

Beware of the text you intend to match in the symbols table. If you want to use it, you need to ensure that it matches one of the patterns.

The correct patterns are:

{ pattern = "[%a_][%w%-_]*", "symbol" },
{ pattern = "%.%.[%a_][%w_]*", "symbol" },

Testing Your New Syntax

To test your new syntax highlighting you need to do two things:

  • Reload the Pragtical core
  • Load a file in your chosen language and see how it looks

To reload the core, you can either restart Pragtical or reload it. To do this, type ctrl+shit+p to open the command palette, then select Core: Restart (or type crr or something similar to match it), then press Enter. You will need to restart the core after any changes you make to the syntax highlighting definition.

Example advanced syntax: Markdown

!!! note This example has features from 2.1. It is not compatible with older versions of Pragtical.

Not all languages are as simple as SSH config files. Markup languages like HTML and Markdown are especially hard to parse correctly. Here's the Markdown syntax file in its full glory:

-- mod-version:3
local syntax = require "core.syntax"
local style = require "core.style"
local core = require "core"

local initial_color = style.syntax["keyword2"]

-- Add 3 type of font styles for use on markdown files
for _, attr in pairs({"bold", "italic", "bold_italic"}) do
local attributes = {}
if attr ~= "bold_italic" then
attributes[attr] = true
else
attributes["bold"] = true
attributes["italic"] = true
end
-- no way to copy user custom font with additional attributes :(
style.syntax_fonts["markdown_"..attr] = renderer.font.load(
DATADIR .. "/fonts/JetBrainsMono-Regular.ttf",
style.code_font:get_size(),
attributes
)
-- also add a color for it
style.syntax["markdown_"..attr] = style.syntax["keyword2"]
end

local in_squares_match = "^%[%]"
local in_parenthesis_match = "^%(%)"

syntax.add {
name = "Markdown",
files = { "%.md$", "%.markdown$" },
block_comment = { "<!--", "-->" },
space_handling = false, -- turn off this feature to handle it our selfs
patterns = {
---- Place patterns that require spaces at start to optimize matching speed
---- and apply the %s+ optimization immediately afterwards
-- bullets
{ pattern = "^%s*%*%s", type = "number" },
{ pattern = "^%s*%-%s", type = "number" },
{ pattern = "^%s*%+%s", type = "number" },
-- numbered bullet
{ pattern = "^%s*[0-9]+[%.%)]%s", type = "number" },
-- blockquote
{ pattern = "^%s*>+%s", type = "string" },
-- alternative bold italic formats
{ pattern = { "%s___", "___%f[%s]" }, type = "markdown_bold_italic" },
{ pattern = { "%s__", "__%f[%s]" }, type = "markdown_bold" },
{ pattern = { "%s_[%S]", "_%f[%s]" }, type = "markdown_italic" },
-- reference links
{
pattern = "^%s*%[%^()["..in_squares_match.."]+()%]: ",
type = { "function", "number", "function" }
},
{
pattern = "^%s*%[%^?()["..in_squares_match.."]+()%]:%s+.+\n",
type = { "function", "number", "function" }
},
-- optimization
{ pattern = "%s+", type = "normal" },

---- HTML rules imported and adapted from language_html
---- to not conflict with markdown rules
-- Inline JS and CSS
{
pattern = {
"<%s*[sS][cC][rR][iI][pP][tT]%s+[tT][yY][pP][eE]%s*=%s*" ..
"['\"]%a+/[jJ][aA][vV][aA][sS][cC][rR][iI][pP][tT]['\"]%s*>",
"<%s*/[sS][cC][rR][iI][pP][tT]>"
},
syntax = ".js",
type = "function"
},
{
pattern = {
"<%s*[sS][cC][rR][iI][pP][tT]%s*>",
"<%s*/%s*[sS][cC][rR][iI][pP][tT]>"
},
syntax = ".js",
type = "function"
},
{
pattern = {
"<%s*[sS][tT][yY][lL][eE][^>]*>",
"<%s*/%s*[sS][tT][yY][lL][eE]%s*>"
},
syntax = ".css",
type = "function"
},
-- Comments
{ pattern = { "<!%-%-", "%-%->" }, type = "comment" },
-- Tags
{ pattern = "%f[^<]![%a_][%w_]*", type = "keyword2" },
{ pattern = "%f[^<][%a_][%w_]*", type = "function" },
{ pattern = "%f[^<]/[%a_][%w_]*", type = "function" },
-- Attributes
{
pattern = "[a-z%-]+%s*()=%s*()\".-\"",
type = { "keyword", "operator", "string" }
},
{
pattern = "[a-z%-]+%s*()=%s*()'.-'",
type = { "keyword", "operator", "string" }
},
{
pattern = "[a-z%-]+%s*()=%s*()%-?%d[%d%.]*",
type = { "keyword", "operator", "number" }
},
-- Entities
{ pattern = "&#?[a-zA-Z0-9]+;", type = "keyword2" },

---- Markdown rules
-- math
{ pattern = { "%$%$", "%$%$", "\\" }, type = "string", syntax = ".tex"},
{ pattern = { "%$", "%$", "\\" }, type = "string", syntax = ".tex"},
-- code blocks
{ pattern = { "```c++", "```" }, type = "string", syntax = ".cpp" },
-- ... there's some other patterns here, but I removed them for brevity
{ pattern = { "```lobster", "```" }, type = "string", syntax = ".lobster" },
{ pattern = { "```", "```" }, type = "string" },
{ pattern = { "``", "``" }, type = "string" },
{ pattern = { "%f[\\`]%`[%S]", "`" }, type = "string" },
-- strike
{ pattern = { "~~", "~~" }, type = "keyword2" },
-- highlight
{ pattern = { "==", "==" }, type = "literal" },
-- lines
{ pattern = "^%-%-%-+\n", type = "comment" },
{ pattern = "^%*%*%*+\n", type = "comment" },
{ pattern = "^___+\n", type = "comment" },
-- bold and italic
{ pattern = { "%*%*%*%S", "%*%*%*" }, type = "markdown_bold_italic" },
{ pattern = { "%*%*%S", "%*%*" }, type = "markdown_bold" },
-- handle edge case where asterisk can be at end of line and not close
{
pattern = { "%f[\\%*]%*[%S]", "%*%f[^%*]" },
type = "markdown_italic"
},
-- alternative bold italic formats
{ pattern = "^___[%s%p%w]+___%s" , type = "markdown_bold_italic" },
{ pattern = "^__[%s%p%w]+__%s" , type = "markdown_bold" },
{ pattern = "^_[%s%p%w]+_%s" , type = "markdown_italic" },
-- heading with custom id
{
pattern = "^#+%s[%w%s%p]+(){()#[%w%-]+()}",
type = { "keyword", "function", "string", "function" }
},
-- headings
{ pattern = "^#+%s.+\n", type = "keyword" },
-- superscript and subscript
{
pattern = "%^()%d+()%^",
type = { "function", "number", "function" }
},
{
pattern = "%~()%d+()%~",
type = { "function", "number", "function" }
},
-- definitions
{ pattern = "^:%s.+", type = "function" },
-- emoji
{ pattern = ":[a-zA-Z0-9_%-]+:", type = "literal" },
-- images and link
{
pattern = "!?%[!?%[()["..in_squares_match.."]+()%]%(()["..in_parenthesis_match.."]+()%)%]%(()["..in_parenthesis_match.."]+()%)",
type = { "function", "string", "function", "number", "function", "number", "function" }
},
{
pattern = "!?%[!?%[?()["..in_squares_match.."]+()%]?%]%(()["..in_parenthesis_match.."]+()%)",
type = { "function", "string", "function", "number", "function" }
},
-- reference links
{
pattern = "%[()["..in_squares_match.."]+()%] *()%[()["..in_squares_match.."]+()%]",
type = { "function", "string", "function", "function", "number", "function" }
},
{
pattern = "!?%[%^?()["..in_squares_match.."]+()%]",
type = { "function", "number", "function" }
},
-- url's and email
{
pattern = "<[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+%.[a-zA-Z0-9-.]+>",
type = "function"
},
{ pattern = "<https?://%S+>", type = "function" },
{ pattern = "https?://%S+", type = "function" },
-- optimize consecutive dashes used in tables
{ pattern = "%-+", type = "normal" },
},
symbols = { },
}

-- Adjust the color on theme changes
core.add_thread(function()
while true do
if initial_color ~= style.syntax["keyword2"] then
for _, attr in pairs({"bold", "italic", "bold_italic"}) do
style.syntax["markdown_"..attr] = style.syntax["keyword2"]
end
initial_color = style.syntax["keyword2"]
end
coroutine.yield(1)
end
end)

It demonstrates a lot of syntax highlighting features that were added to v2.1.0 and some workarounds needed.

Syntax fonts (Since 1.16.10)

The syntax allows users to set different font styles (bold, italic, etc.) for different patterns. To change the font style of a token, add a Font to style.syntax_fonts[token_type].

For example:

-- will ensure every "fancysyntax_fancy_token" is italic
style.syntax_fonts["fancysyntax_fancy_token"] = renderer.font.load("myfont.ttf", 14 * SCALE, { italic = true })

The markdown example automates this with a for loop.

The limitations here are that fonts cannot be copied with different attributes, thus the font path has to be hard-coded. Other than that, abusing style.syntax_fonts may lead to slow performance and high memory consumption. This is very obvious when the user tries to resize the editor with ctrl-scroll or ctrl+ and ctrl-. Please use it in moderation.

Space handling (since v2.1.0)

By default, Pragtical prepends a pattern { pattern = "%s+", type = "normal" } to the syntax. This improves the performance drastically on lines that starts with whitespace (e.g. heavily indented lines). It works by matching the whitespace before other patterns in order to prevent Pragtical from iterating the entire syntax. However, there may be syntaxes that require matching spaces (e.g. Markdown with indented blocks) so this can be disabled by setting space_handling to false.

To keep the space handling optimization or to support older versions of Pragtical, { pattern = "%s+", type = "normal" } can be added after patterns that require space.

Simple patterns with multiple tokens (v1.16.10)

This is an excerpt taken from the Markdown plugin:

local in_squares_match = "^%[%]"
-- reference links
{
pattern = "^%s*%[%^()["..in_squares_match.."]+()%]: ",
type = { "function", "number", "function" }
},

Sometimes it makes sense to highlight different parts of a pattern differently. An empty parenthesis (()) in Lua patterns will return the position of the text in the parentheses. This will tell Pragtical when to change the type of token. For instance, ^%s*%[%^ is "function", ["..in_squares_match.."]+ is "number" and %]: is "function".

Subsyntaxes (Since v1.16.10)

Pragtical supports embedding another syntax into the existing syntax. This is used to support code blocks inside the Markdown syntax.

For example:

{ pattern = { "```cpp", "```" },        type = "string", syntax = ".cpp" },

This would highlight ```cpp and ``` with "string" while everything inside them will be highlighted with a syntax that matches ".cpp".