| tags: [ Software Rust Tree-sitter ]
Using Tree-sitter Parsers in Rust
Update: 2022-12-26: Fixed unwrap() on Err by updating to latest tree-sitter
Update: 2019-08-10: Fixed rerun directive in build.rs
Tree-sitter is a parser generator tool and parsing library. It generates portable parsers that can be used in several languages including Rust. Tree-sitter grammars are available for several languages.
This is a game changer because it lowers the barrier to entry for writing language tooling. You no longer need to write your own parser. With Tree-sitter, you can now simply use an existing parser.
Toolchain
Tree-sitter grammars are written in Javascript. The grammars are executed using Node to generate the grammar JSON. The Tree-sitter CLI uses the grammar JSON to generate a C-based parser. The parser is compiled into a Rust binary and used via the Rust Tree-sitter bindings.
Install the Dependencies
Node is required to generate the grammar JSON. Install node:
sudo apt-get install nodejs
NOTE: This article assumes you have Rust and a C compiler installed.
Create a New Rust Project
cargo new tree-sitter-verilog-test
cd tree-sitter-verilog-test
git init
git add .
git commit -m "Initial commit"
Obtain the Grammar
A list of existing Tree-sitter grammars is available at https://tree-sitter.github.io/tree-sitter.
I’ll be using the Verilog grammar. Obtain the grammar:
git submodule add https://github.com/tree-sitter/tree-sitter-verilog.git
git commit -m "Add tree-sitter-verilog"
Generate the Parser
cd tree-sitter-verilog
npm install
This installs the Tree-sitter CLI and runs tree-sitter generate
which executes the grammar Javascript to generate the grammar JSON then generates the C-based parser.
The parser is written to the tree-sitter-verilog/src/
directory.
Compile the Parser
A Cargo build script is needed to compile and link the parser into the Rust binary.
We need the cc crate for compiling C code into our Rust binary.
Add the cc
crate to the build-dependencies
section of Cargo.toml
:
[build-dependencies]
cc = "1.0"
Create a build.rs
build script with the contents:
fn main() {
let language = "verilog";
let package = format!("tree-sitter-{}", language);
let source_directory = format!("{}/src", package);
let source_file = format!("{}/parser.c", source_directory);
println!("cargo:rerun-if-changed={}", source_file); // <1>
cc::Build::new()
.file(source_file)
.include(source_directory)
.compile(&package); // <2>
}
- Tells Cargo to only re-run the build script if the parser source has changed.
- Compiles the parser C code into the Rust binary.
NOTE: We could instead rerun on a change in the grammar Javascript and add a call to npm install
to fully automate the building of the parser.
Use the Parser
We’ll be using the parser via the Rust Tree-sitter bindings provided by the tree-sitter crate.
Add the tree-sitter
crate to the dependencies
section of Cargo.toml
:
[dependencies]
tree-sitter = "0.20.9"
Edit the contents of src/main.rs
to be the following:
use tree_sitter::{Parser, Language};
extern "C" { fn tree_sitter_verilog() -> Language; }
fn main() {
println!("Hello, world!");
}
#[test]
fn test_parser() {
let language = unsafe { tree_sitter_verilog() };
let mut parser = Parser::new();
parser.set_language(language).unwrap();
let source_code = "module mymodule(); endmodule";
let tree = parser.parse(source_code, None).unwrap();
assert_eq!(tree.root_node().to_sexp(), "(source_file (module_declaration (module_header (module_keyword) (module_identifier (simple_identifier))) (module_nonansi_header (list_of_ports))))");
}
Running cargo test
should then give the following output:
$ cargo test
Compiling tree-sitter-verilog-test v0.1.0 (/home/rfdonnelly/repos/tree-sitter-verilog-test)
Finished dev [unoptimized + debuginfo] target(s) in 6.27s
Running target/debug/deps/tree_sitter_verilog_test-3a23b31b32f84a74
running 1 test
test test_parser ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
Finally commit the necessary files:
git add -f Cargo.lock Cargo.toml build.rs src/main.rs
git commit -m "Add Tree-sitter parser"