About this guide
This guide is meant to help document how rustc – the Rust compiler – works, as well as to help new contributors get involved in rustc development. It is not meant to replace code documentation – each chapter gives only high-level details – the kinds of things that (ideally) don't change frequently.
There are three parts to this guide. Part 1 contains information that should be useful no matter how you are contributing. Part 2 contains information about how the compiler works. Finally, there are some appendices at the end with useful reference information.
The guide itself is of course open-source as well, and the sources can be found at the GitHub repository. If you find any mistakes in the guide, please file an issue about it, or even better, open a PR with a correction!
Other places to find information
You might also find the following sites useful:
- Rustc API docs -- rustdoc documentation for the compiler
- Forge -- contains documentation about rust infrastructure, team procedures, and more
- compiler-team -- the home-base for the rust compiler team, with description of the team procedures, active working groups, and the team calendar.
第一部分:构建、调试并为 Rustc 做贡献
rustc 指南的这部分内容包含了对您有用的知识,不论您所从事的是编译器哪部分的工作。 包括技术的信息和技巧(例如:如何编译和调试编译器),以及关于 Rust 项目进度的消息(例如:关于编译器团队的稳定性和消息)。
关于编译器团队
rustc 由 Rust 编译器团队 维护。这个团队的成员共同跟踪性能退化问题、实现新特性。 Rust 编译器团队的成员由为 rustc 和它的设计做出重大贡献的人们组成。
讨论
目前编译器团队在两个地方讨论:
- Zulip 实例 的
t-compiler
流 - rust-lang discord 的
compiler
频道
专家地图
如果您有兴趣搞清楚谁可以回答关于编译器某个特定领域的问题,或者您只是想知道谁在做这方面的工作, 来看看我们的 专家目录。 它包含了一份编译器各部分的列表,以及每个部分的专家列表。
Rust 编译器会议
编译器团队每周举办一个周会,我们在会上对问题进行分类,并聚焦在新的 bugs、性能退化和其它一些问题。 会议的总体计划在rust 编译器会议 etherpad。大致如下:
- 评审 P-high bugs: P-high bugs 对我们非常重要,需要积极跟踪进度。 在理想情况下,P-high bugs 总是要分配给一个受托人。
- 检查新的性能退化: 我们接着找出编译器破坏原本可以正常工作代码的案例。 性能退化几乎总是被标记为 P-high;主要的例外是修复 bug(尽管我们经常会首先给出警告)
- 检查 I-nominated 问题: 这些问题需要来自团队的反馈。
- 检查 beta-nominations: 这些是被提名回退到 beta 状态的事情。
会议目前在波士顿时间每周二上午10点举行(通常是UTC-4时区,但是夏时制有时会带来混乱)。
会议通过“聊天媒体”举行,目前在zulip
团队成员资格
Rust 团队的成员资格通常提供给长期为编译器做出重大贡献的人。 成员资格既是一种承认,也是一种义务: 编译器团队成员通常需要帮助进行日常维护,以及进行评审和其它工作。
如果您有兴趣成为编译器团队成员,首先从修复 bug 开始,或者参与一个工作组的工作。 发现 bug 的一个好方法是寻找使用 E-easy 标签的开放问题 或者E-mentor 标签.
r+ 权限
一旦您为 rustc 提交了一些独立的 PRs,我们通常会给您授予 r+ 特权。 这意味着您有权力指导“bors”(管理哪些 PR 合并进入 rustc 的机器人)合并一个 PR (这里是如何与 bors 对话的一些指令)
评审人的指导方针如下:
- 欢迎您随时查看任何 PR,不论它被指派给谁。然而,不包括 r+ 的 PR,除非:
- 您对那部分代码有自信。
- 您确信没有人想要首先评审它。
- 例如,人们有时会表达在合并一个 PR 之前评审的愿望, 可能是因为它和代码中特别敏感的部分有关。
- 在评审的时候始终保持礼貌:您是 Rust 项目的代表, 因此,当涉及到[行为准则]时,人们期望您能做得更好。
[行为准则]: https://www.rust-lang.org/policies/code-of-conduct
high-five
一旦您拥有了 r+ 权限,您也可以被加入到 high-five 的自动流程中。
high-five 是负责分配新的 PR 给评审人的机器人。
如果您已经加入,您将会被随机抽选为评审人。
如果您觉得分配给您的 PR 您不喜欢评审,您可以留下一个评论 r? @某人
来把它分配给别人 -
如果您不知道请求谁,就写 r? @nikomatsakis 来重新分配
,@nikomatsakis 会为您指派一个人。
进入 high-five 名单是非常受欢迎的,因为它降低了我们所有人的评审门槛! 然而,如果您没有时间对人们的 PR 进行及时的反馈,也许您不出现在名单上更好。
完整的团队成员资格
一旦有人长期为 Rust 编译器做出很多贡献,他通常会扩展到完整的团队成员资格, 理想情况(但不一定)是多个领域。有时,这可能是实现一个新特性, 但同样很重要 - 也许更重要的是 - 有时间和意愿帮助进行日常维护, 比如修复 bug,追踪性能退化,和其它不那么有吸引力的工作。
How to Build and Run the Compiler
The compiler is built using a tool called x.py
. You will need to
have Python installed to run it. But before we get to that, if you're going to
be hacking on rustc
, you'll want to tweak the configuration of the compiler.
The default configuration is oriented towards running the compiler as a user,
not a developer.
Create a config.toml
To start, copy config.toml.example
to config.toml
:
> cd $RUST_CHECKOUT
> cp config.toml.example config.toml
Then you will want to open up the file and change the following
settings (and possibly others, such as llvm.ccache
):
[llvm]
# Enables LLVM assertions, which will check that the LLVM bitcode generated
# by the compiler is internally consistent. These are particularly helpful
# if you edit `codegen`.
assertions = true
[rust]
# This will make your build more parallel; it costs a bit of runtime
# performance perhaps (less inlining) but it's worth it.
codegen-units = 0
# This enables full debuginfo and debug assertions. The line debuginfo is also
# enabled by `debuginfo-level = 1`. Full debuginfo is also enabled by
# `debuginfo-level = 2`. Debug assertions can also be enabled with
# `debug-assertions = true`. Note that `debug = true` will make your build
# slower, so you may want to try individually enabling debuginfo and assertions
# or enable only line debuginfo which is basically free.
debug = true
What is x.py
?
x.py
is the script used to orchestrate the tooling in the rustc
repository.
It is the script that can build docs, run tests, and compile rustc
.
It is the now preferred way to build rustc
and it replaces the old makefiles
from before. Below are the different ways to utilize x.py
in order to
effectively deal with the repo for various common tasks.
Running x.py
and building a stage1 compiler
One thing to keep in mind is that rustc
is a bootstrapping
compiler. That is, since rustc
is written in Rust, we need to use an
older version of the compiler to compile the newer version. In
particular, the newer version of the compiler and some of the artifacts needed
to build it, such as libstd
and other tooling, may use some unstable features
internally, requiring a specific version which understands these unstable
features.
The result is that compiling rustc
is done in stages:
- Stage 0: the stage0 compiler is usually (you can configure
x.py
to use something else) the current betarustc
compiler and its associated dynamic libraries (whichx.py
will download for you). This stage0 compiler is then used only to compilerustbuild
,std
,test
, andrustc
. When compilingtest
andrustc
, this stage0 compiler uses the freshly compiledstd
. There are two concepts at play here: a compiler (with its set of dependencies) and its 'target' or 'object' libraries (std
,test
, andrustc
). Both are staged, but in a staggered manner. - Stage 1: the code in your clone (for new version) is then
compiled with the stage0 compiler to produce the stage1 compiler.
However, it was built with an older compiler (stage0), so to
optimize the stage1 compiler we go to next the stage.
- In theory, the stage1 compiler is functionally identical to the stage2 compiler, but in practice there are subtle differences. In particular, the stage1 compiler itself was built by stage0 and hence not by the source in your working directory: this means that the symbol names used in the compiler source may not match the symbol names that would have been made by the stage1 compiler. This can be important when using dynamic linking (e.g., with derives. Sometimes this means that some tests don't work when run with stage1.
- Stage 2: we rebuild our stage1 compiler with itself to produce the stage2 compiler (i.e. it builds itself) to have all the latest optimizations. (By default, we copy the stage1 libraries for use by the stage2 compiler, since they ought to be identical.)
- (Optional) Stage 3: to sanity check our new compiler, we can build the libraries with the stage2 compiler. The result ought to be identical to before, unless something has broken.
A note on stage meanings
When running x.py
you will see output such as:
Building stage0 std artifacts
Copying stage0 std from stage0
Building stage0 test artifacts
Copying stage0 test from stage0
Building stage0 compiler artifacts
Copying stage0 rustc from stage0
Building LLVM for x86_64-apple-darwin
Building stage0 codegen artifacts
Assembling stage1 compiler
Building stage1 std artifacts
Copying stage1 std from stage1
Building stage1 test artifacts
Copying stage1 test from stage1
Building stage1 compiler artifacts
Copying stage1 rustc from stage1
Building stage1 codegen artifacts
Assembling stage2 compiler
Uplifting stage1 std
Copying stage2 std from stage1
Generating unstable book md files
Building stage0 tool unstable-book-gen
Building stage0 tool rustbook
Documenting standalone
Building rustdoc for stage2
Documenting book redirect pages
Documenting stage2 std
Building rustdoc for stage1
Documenting stage2 test
Documenting stage2 whitelisted compiler
Documenting stage2 compiler
Documenting stage2 rustdoc
Documenting error index
Uplifting stage1 test
Copying stage2 test from stage1
Uplifting stage1 rustc
Copying stage2 rustc from stage1
Building stage2 tool error_index_generator
A deeper look into x.py
's phases can be seen here:
Keep in mind this diagram is a simplification, i.e. rustdoc
can be built at
different stages, the process is a bit different when passing flags such as
--keep-stage
, or if there are non-host targets.
The following tables indicate the outputs of various stage actions:
Stage 0 Action | Output |
---|---|
beta extracted | build/HOST/stage0 |
stage0 builds bootstrap | build/bootstrap |
stage0 builds libstd | build/HOST/stage0-std/TARGET |
copy stage0-std (HOST only) | build/HOST/stage0-sysroot/lib/rustlib/HOST |
stage0 builds libtest with stage0-sysroot | build/HOST/stage0-test/TARGET |
copy stage0-test (HOST only) | build/HOST/stage0-sysroot/lib/rustlib/HOST |
stage0 builds rustc with stage0-sysroot | build/HOST/stage0-rustc/HOST |
copy stage0-rustc (except executable) | build/HOST/stage0-sysroot/lib/rustlib/HOST |
build llvm | build/HOST/llvm |
stage0 builds codegen with stage0-sysroot | build/HOST/stage0-codgen/HOST |
stage0 builds rustdoc with stage0-sysroot | build/HOST/stage0-tools/HOST |
--stage=0
stops here.
Stage 1 Action | Output |
---|---|
copy (uplift) stage0-rustc executable to stage1 | build/HOST/stage1/bin |
copy (uplift) stage0-codegen to stage1 | build/HOST/stage1/lib |
copy (uplift) stage0-sysroot to stage1 | build/HOST/stage1/lib |
stage1 builds libstd | build/HOST/stage1-std/TARGET |
copy stage1-std (HOST only) | build/HOST/stage1/lib/rustlib/HOST |
stage1 builds libtest | build/HOST/stage1-test/TARGET |
copy stage1-test (HOST only) | build/HOST/stage1/lib/rustlib/HOST |
stage1 builds rustc | build/HOST/stage1-rustc/HOST |
copy stage1-rustc (except executable) | build/HOST/stage1/lib/rustlib/HOST |
stage1 builds codegen | build/HOST/stage1-codegen/HOST |
--stage=1
stops here.
Stage 2 Action | Output |
---|---|
copy (uplift) stage1-rustc executable | build/HOST/stage2/bin |
copy (uplift) stage1-sysroot | build/HOST/stage2/lib and build/HOST/stage2/lib/rustlib/HOST |
stage2 builds libstd (except HOST?) | build/HOST/stage2-std/TARGET |
copy stage2-std (not HOST targets) | build/HOST/stage2/lib/rustlib/TARGET |
stage2 builds libtest (except HOST?) | build/HOST/stage2-test/TARGET |
copy stage2-test (not HOST targets) | build/HOST/stage2/lib/rustlib/TARGET |
stage2 builds rustdoc | build/HOST/stage2-tools/HOST |
copy rustdoc | build/HOST/stage2/bin |
--stage=2
stops here.
Note that the convention x.py
uses is that:
- A "stage N artifact" is an artifact that is produced by the stage N compiler.
- The "stage (N+1) compiler" is assembled from "stage N artifacts".
- A
--stage N
flag means build with stage N.
In short, stage 0 uses the stage0 compiler to create stage0 artifacts which will later be uplifted to stage1.
Every time any of the main artifacts (std
, test
, rustc
) are compiled, two
steps are performed.
When std
is compiled by a stage N compiler, that std
will be linked to
programs built by the stage N compiler (including test and rustc
built later
on). It will also be used by the stage (N+1) compiler to link against itself.
This is somewhat intuitive if one thinks of the stage (N+1) compiler as "just"
another program we are building with the stage N compiler. In some ways, rustc
(the binary, not the rustbuild
step) could be thought of as one of the few
no_core
binaries out there.
So "stage0 std artifacts" are in fact the output of the downloaded stage0
compiler, and are going to be used for anything built by the stage0 compiler:
e.g. rustc
, test
artifacts. When it announces that it is "building stage1
std artifacts" it has moved on to the next bootstrapping phase. This pattern
continues in latter stages.
Also note that building host std
and target std
are different based on the
stage (e.g. see in the table how stage2 only builds non-host std
targets.
This is because during stage2, the host std
is uplifted from the "stage 1"
std
-- specifically, when "Building stage 1 artifacts" is announced, it is
later copied into stage2 as well (both the compiler's libdir
and the
sysroot
).
This std
is pretty much necessary for any useful work with the compiler.
Specifically, it's used as the std
for programs compiled by the newly compiled
compiler (so when you compile fn main() { }
it is linked to the last std
compiled with x.py build --stage 1 src/libstd
).
The rustc
generated by the stage0 compiler is linked to the freshly-built
libstd
, which means that for the most part only std
needs to be cfg-gated,
so that rustc
can use featured added to std immediately after their addition,
without need for them to get into the downloaded beta. The libstd
built by the
stage1/bin/rustc
compiler, also known as "stage1 std artifacts", is not
necessarily ABI-compatible with that compiler.
That is, the rustc
binary most likely could not use this std
itself.
It is however ABI-compatible with any programs that the stage1/bin/rustc
binary builds (including itself), so in that sense they're paired.
This is also where --keep-stage 1 src/libstd
comes into play. Since most
changes to the compiler don't actually change the ABI, once you've produced a
libstd
in stage 1, you can probably just reuse it with a different compiler.
If the ABI hasn't changed, you're good to go, no need to spend the time
recompiling that std
.
--keep-stage
simply assumes the previous compile is fine and copies those
artifacts into the appropriate place, skipping the cargo invocation.
The reason we first build std
, then test
, then rustc
, is largely just
because we want to minimize cfg(stage0)
in the code for rustc
.
Currently rustc
is always linked against a "new" std
/test
so it doesn't
ever need to be concerned with differences in std; it can assume that the std is
as fresh as possible.
The reason we need to build it twice is because of ABI compatibility.
The beta compiler has it's own ABI, and then the stage1/bin/rustc
compiler
will produce programs/libraries with the new ABI.
We used to build three times, but because we assume that the ABI is constant
within a codebase, we presume that the libraries produced by the "stage2"
compiler (produced by the stage1/bin/rustc
compiler) is ABI-compatible with
the stage1/bin/rustc
compiler's produced libraries.
What this means is that we can skip that final compilation -- and simply use the
same libraries as the stage2/bin/rustc
compiler uses itself for programs it
links against.
This stage2/bin/rustc
compiler is shipped to end-users, along with the
stage 1 {std,test,rustc}
artifacts.
If you want to learn more about x.py
, read its README.md
here.
Build Flags
There are other flags you can pass to the build command of x.py
that can be
beneficial to cutting down compile times or fitting other things you might
need to change. They are:
Options:
-v, --verbose use verbose output (-vv for very verbose)
-i, --incremental use incremental compilation
--config FILE TOML configuration file for build
--build BUILD build target of the stage0 compiler
--host HOST host targets to build
--target TARGET target targets to build
--on-fail CMD command to run on failure
--stage N stage to build
--keep-stage N stage to keep without recompiling
--src DIR path to the root of the rust checkout
-j, --jobs JOBS number of jobs to run in parallel
-h, --help print this help message
For hacking, often building the stage 1 compiler is enough, but for final testing and release, the stage 2 compiler is used.
./x.py check
is really fast to build the rust compiler.
It is, in particular, very useful when you're doing some kind of
"type-based refactoring", like renaming a method, or changing the
signature of some function.
Once you've created a config.toml, you are now ready to run
x.py
. There are a lot of options here, but let's start with what is
probably the best "go to" command for building a local rust:
> ./x.py build -i --stage 1 src/libstd
This may look like it only builds libstd, but that is not the case. What this command does is the following:
- Build
libstd
using the stage0 compiler (using incremental) - Build
librustc
using the stage0 compiler (using incremental)- This produces the stage1 compiler
- Build libstd using the stage1 compiler (cannot use incremental)
This final product (stage1 compiler + libs built using that compiler)
is what you need to build other rust programs (unless you use #![no_std]
or
#![no_core]
).
The command includes the -i
switch which enables incremental compilation.
This will be used to speed up the first two steps of the process:
in particular, if you make a small change, we ought to be able to use your old
results to make producing the stage1 compiler faster.
Unfortunately, incremental cannot be used to speed up making the
stage1 libraries. This is because incremental only works when you run
the same compiler twice in a row. In this case, we are building a
new stage1 compiler every time. Therefore, the old incremental
results may not apply. As a result, you will probably find that
building the stage1 libstd
is a bottleneck for you -- but fear not,
there is a (hacky) workaround. See the section on "recommended
workflows" below.
Note that this whole command just gives you a subset of the full rustc
build. The full rustc
build (what you get if you just say ./x.py build
) has quite a few more steps:
- Build
librustc
andrustc
with the stage1 compiler.- The resulting compiler here is called the "stage2" compiler.
- Build libstd with stage2 compiler.
- Build librustdoc and a bunch of other things with the stage2 compiler.
Build specific components
Build only the libcore library
> ./x.py build src/libcore
Build the libcore and libproc_macro library only
> ./x.py build src/libcore src/libproc_macro
Build only libcore up to Stage 1
> ./x.py build src/libcore --stage 1
Sometimes you might just want to test if the part you’re working on can compile. Using these commands you can test that it compiles before doing a bigger build to make sure it works with the compiler. As shown before you can also pass flags at the end such as --stage.
Creating a rustup toolchain
Once you have successfully built rustc
, you will have created a bunch
of files in your build
directory. In order to actually run the
resulting rustc
, we recommend creating rustup toolchains. The first
one will run the stage1 compiler (which we built above). The second
will execute the stage2 compiler (which we did not build, but which
you will likely need to build at some point; for example, if you want
to run the entire test suite).
> rustup toolchain link stage1 build/<host-triple>/stage1
> rustup toolchain link stage2 build/<host-triple>/stage2
The <host-triple>
would typically be one of the following:
- Linux:
x86_64-unknown-linux-gnu
- Mac:
x86_64-apple-darwin
- Windows:
x86_64-pc-windows-msvc
Now you can run the rustc
you built with. If you run with -vV
, you
should see a version number ending in -dev
, indicating a build from
your local environment:
> rustc +stage1 -vV
rustc 1.25.0-dev
binary: rustc
commit-hash: unknown
commit-date: unknown
host: x86_64-unknown-linux-gnu
release: 1.25.0-dev
LLVM version: 4.0
Suggested workflows for faster builds of the compiler
There are two workflows that are useful for faster builds of the compiler.
Check, check, and check again. The first workflow, which is useful
when doing simple refactorings, is to run ./x.py check
continuously. Here you are just checking that the compiler can
build, but often that is all you need (e.g., when renaming a
method). You can then run ./x.py build
when you actually need to
run tests.
In fact, it is sometimes useful to put off tests even when you are not
100% sure the code will work. You can then keep building up
refactoring commits and only run the tests at some later time. You can
then use git bisect
to track down precisely which commit caused
the problem. A nice side-effect of this style is that you are left
with a fairly fine-grained set of commits at the end, all of which
build and pass tests. This often helps reviewing.
Incremental builds with --keep-stage
. Sometimes just checking
whether the compiler builds is not enough. A common example is that
you need to add a debug!
statement to inspect the value of some
state or better understand the problem. In that case, you really need
a full build. By leveraging incremental, though, you can often get
these builds to complete very fast (e.g., around 30 seconds): the only
catch is this requires a bit of fudging and may produce compilers that
don't work (but that is easily detected and fixed).
The sequence of commands you want is as follows:
- Initial build:
./x.py build -i --stage 1 src/libstd
- As documented above, this will build a functional
stage1 compiler as part of running all stage0 commands (which include
building a
libstd
compatible with the stage1 compiler) as well as the first few steps of the "stage 1 actions" up to "stage1 (sysroot stage1) builds libstd".
- As documented above, this will build a functional
stage1 compiler as part of running all stage0 commands (which include
building a
- Subsequent builds:
./x.py build -i --stage 1 src/libstd --keep-stage 1
- Note that we added the
--keep-stage 1
flag here
- Note that we added the
As mentioned, the effect of --keep-stage 1
is that we just assume that the
old standard library can be re-used. If you are editing the compiler, this
is almost always true: you haven't changed the standard library, after
all. But sometimes, it's not true: for example, if you are editing
the "metadata" part of the compiler, which controls how the compiler
encodes types and other states into the rlib
files, or if you are
editing things that wind up in the metadata (such as the definition of
the MIR).
The TL;DR is that you might get weird behavior from a compile when
using --keep-stage 1
-- for example, strange
ICEs or other panics. In that case, you
should simply remove the --keep-stage 1
from the command and
rebuild. That ought to fix the problem.
You can also use --keep-stage 1
when running tests. Something like this:
- Initial test run:
./x.py test -i --stage 1 src/test/ui
- Subsequent test run:
./x.py test -i --stage 1 src/test/ui --keep-stage 1
Other x.py
commands
Here are a few other useful x.py
commands. We'll cover some of them in detail
in other sections:
- Building things:
./x.py clean
– clean up the build directory (rm -rf build
works too, but then you have to rebuild LLVM)./x.py build --stage 1
– builds everything using the stage 1 compiler, not just up to libstd./x.py build
– builds the stage2 compiler
- Running tests (see the section on running tests for
more details):
./x.py test --stage 1 src/libstd
– runs the#[test]
tests from libstd./x.py test --stage 1 src/test/run-pass
– runs therun-pass
test suite./x.py test --stage 1 src/test/ui/const-generics
- runs all the tests in theconst-generics/
subdirectory of theui
test suite./x.py test --stage 1 src/test/ui/const-generics/const-types.rs
- runs the single testconst-types.rs
from theui
test suite
ctags
One of the challenges with rustc is that the RLS can't handle it, since it's a
bootstrapping compiler. This makes code navigation difficult. One solution is to
use ctags
.
ctags
has a long history and several variants. Exhuberant CTags seems to be
quite commonly distributed but it does not have out-of-box Rust support. Some
distributions seem to use Universal Ctags, which is a maintained fork
and does have built-in Rust support.
The following script can be used to set up Exhuberant Ctags: https://github.com/nikomatsakis/rust-etags.
ctags
integrates into emacs and vim quite easily. The following can then be
used to build and generate tags:
$ rust-ctags src/lib* && ./x.py build <something>
This allows you to do "jump-to-def" with whatever functions were around when you last built, which is ridiculously useful.
Cleaning out build directories
Sometimes you need to start fresh, but this is normally not the case. If you need to run this then rustbuild is most likely not acting right and you should file a bug as to what is going wrong. If you do need to clean everything up then you only need to run one command!
> ./x.py clean
Compiler Documentation
The documentation for the rust components are found at rustc doc.
Build distribution artifacts
You might want to build and package up the compiler for distribution. You’ll want to run this command to do it:
./x.py dist
Install distribution artifacts
If you’ve built a distribution artifact you might want to install it and test that it works on your target system. You’ll want to run this command:
./x.py install
Note: If you are testing out a modification to a compiler, you might want to use it to compile some project. Usually, you do not want to use ./x.py install for testing. Rather, you should create a toolchain as discussed in here.
For example, if the toolchain you created is called foo, you
would then invoke it with rustc +foo ...
(where ... represents
the rest of the arguments).
Documenting rustc
You might want to build documentation of the various components available like the standard library. There’s two ways to go about this. You can run rustdoc directly on the file to make sure the HTML is correct, which is fast. Alternatively, you can build the documentation as part of the build process through x.py. Both are viable methods since documentation is more about the content.
Document everything
./x.py doc
If you want to avoid the whole Stage 2 build
./x.py doc --stage 1
First the compiler and rustdoc get built to make sure everything is okay and then it documents the files.
Document specific components
./x.py doc src/doc/book
./x.py doc src/doc/nomicon
./x.py doc src/doc/book src/libstd
Much like individual tests or building certain components you can build only the documentation you want.
Document internal rustc items
Compiler documentation is not built by default. There's a flag in config.toml for achieving the same. But, when enabled, compiler documentation does include internal items.
Next open up config.toml and make sure these two lines are set to true:
docs = true
compiler-docs = true
When you want to build the compiler docs as well run this command:
./x.py doc
This will see that the docs and compiler-docs options are set to true and build the normally hidden compiler docs!
Compiler Documentation
The documentation for the rust components are found at rustc doc.
The compiler testing framework
The Rust project runs a wide variety of different tests, orchestrated
by the build system (x.py test
). The main test harness for testing
the compiler itself is a tool called compiletest (sources in the
src/tools/compiletest
). This section gives a brief overview of how
the testing framework is setup, and then gets into some of the details
on how to run tests as well as
how to add new tests.
Compiletest test suites
The compiletest tests are located in the tree in the src/test
directory. Immediately within you will see a series of subdirectories
(e.g. ui
, run-make
, and so forth). Each of those directories is
called a test suite – they house a group of tests that are run in
a distinct mode.
Here is a brief summary of the test suites as of this writing and what they mean. In some cases, the test suites are linked to parts of the manual that give more details.
ui
– tests that check the exact stdout/stderr from compilation and/or running the testrun-pass
– tests that are expected to compile and execute successfully (no panics)run-pass-valgrind
– tests that ought to run with valgrind
run-fail
– tests that are expected to compile but then panic during executioncompile-fail
– tests that are expected to fail compilation.parse-fail
– tests that are expected to fail to parsepretty
– tests targeting the Rust "pretty printer", which generates valid Rust code from the ASTdebuginfo
– tests that run in gdb or lldb and query the debug infocodegen
– tests that compile and then test the generated LLVM code to make sure that the optimizations we want are taking effect.assembly
– similar tocodegen
tests, but verifies assembly output to make sure LLVM target backend can handle provided code.mir-opt
– tests that check parts of the generated MIR to make sure we are building things correctly or doing the optimizations we expect.incremental
– tests for incremental compilation, checking that when certain modifications are performed, we are able to reuse the results from previous compilations.run-make
– tests that basically just execute aMakefile
; the ultimate in flexibility but quite annoying to write.rustdoc
– tests for rustdoc, making sure that the generated files contain the expected documentation.*-fulldeps
– same as above, but indicates that the test depends on things other thanlibstd
(and hence those things must be built)
Other Tests
The Rust build system handles running tests for various other things, including:
-
Tidy – This is a custom tool used for validating source code style and formatting conventions, such as rejecting long lines. There is more information in the section on coding conventions.
Example:
./x.py test src/tools/tidy
-
Unit tests – The Rust standard library and many of the Rust packages include typical Rust
#[test]
unittests. Under the hood,x.py
will runcargo test
on each package to run all the tests.Example:
./x.py test src/libstd
-
Doc tests – Example code embedded within Rust documentation is executed via
rustdoc --test
. Examples:./x.py test src/doc
– Runsrustdoc --test
for all documentation insrc/doc
../x.py test --doc src/libstd
– Runsrustdoc --test
on the standard library. -
Link checker – A small tool for verifying
href
links within documentation.Example:
./x.py test src/tools/linkchecker
-
Dist check – This verifies that the source distribution tarball created by the build system will unpack, build, and run all tests.
Example:
./x.py test distcheck
-
Tool tests – Packages that are included with Rust have all of their tests run as well (typically by running
cargo test
within their directory). This includes things such as cargo, clippy, rustfmt, rls, miri, bootstrap (testing the Rust build system itself), etc. -
Cargo test – This is a small tool which runs
cargo test
on a few significant projects (such asservo
,ripgrep
,tokei
, etc.) just to ensure there aren't any significant regressions.Example:
./x.py test src/tools/cargotest
Testing infrastructure
When a Pull Request is opened on Github, Travis will automatically launch a
build that will run all tests on a single configuration (x86-64 linux). In
essence, it runs ./x.py test
after building.
The integration bot bors is used for coordinating merges to the master branch. When a PR is approved, it goes into a queue where merges are tested one at a time on a wide set of platforms using Travis and Appveyor (currently over 50 different configurations). Most platforms only run the build steps, some run a restricted set of tests, only a subset run the full suite of tests (see Rust's platform tiers).
Testing with Docker images
The Rust tree includes Docker image definitions for the platforms used on Travis in src/ci/docker. The script src/ci/docker/run.sh is used to build the Docker image, run it, build Rust within the image, and run the tests.
TODO: What is a typical workflow for testing/debugging on a platform that you don't have easy access to? Do people build Docker images and enter them to test things out?
Testing on emulators
Some platforms are tested via an emulator for architectures that aren't
readily available. There is a set of tools for orchestrating running the
tests within the emulator. Platforms such as arm-android
and
arm-unknown-linux-gnueabihf
are set up to automatically run the tests under
emulation on Travis. The following will take a look at how a target's tests
are run under emulation.
The Docker image for armhf-gnu includes QEMU to emulate the ARM CPU
architecture. Included in the Rust tree are the tools remote-test-client
and remote-test-server which are programs for sending test programs and
libraries to the emulator, and running the tests within the emulator, and
reading the results. The Docker image is set up to launch
remote-test-server
and the build tools use remote-test-client
to
communicate with the server to coordinate running tests (see
src/bootstrap/test.rs).
TODO: What are the steps for manually running tests within an emulator?
./src/ci/docker/run.sh armhf-gnu
will do everything, but takes hours to run and doesn't offer much help with interacting within the emulator.Is there any support for emulating other (non-Android) platforms, such as running on an iOS emulator?
Is there anything else interesting that can be said here about running tests remotely on real hardware?
It's also unclear to me how the wasm or asm.js tests are run.
Crater
Crater is a tool for compiling and running tests for every crate on crates.io (and a few on GitHub). It is mainly used for checking for extent of breakage when implementing potentially breaking changes and ensuring lack of breakage by running beta vs stable compiler versions.
When to run Crater
You should request a crater run if your PR makes large changes to the compiler or could cause breakage. If you are unsure, feel free to ask your PR's reviewer.
Requesting Crater Runs
The rust team maintains a few machines that can be used for running crater runs on the changes introduced by a PR. If your PR needs a crater run, leave a comment for the triage team in the PR thread. Please inform the team whether you require a "check-only" crater run, a "build only" crater run, or a "build-and-test" crater run. The difference is primarily in time; the conservative (if you're not sure) option is to go for the build-and-test run. If making changes that will only have an effect at compile-time (e.g., implementing a new trait) then you only need a check run.
Your PR will be enqueued by the triage team and the results will be posted when they are ready. Check runs will take around ~3-4 days, with the other two taking 5-6 days on average.
While crater is really useful, it is also important to be aware of a few caveats:
-
Not all code is on crates.io! There is a lot of code in repos on GitHub and elsewhere. Also, companies may not wish to publish their code. Thus, a successful crater run is not a magically green light that there will be no breakage; you still need to be careful.
-
Crater only runs Linux builds on x86_64. Thus, other architectures and platforms are not tested. Critically, this includes Windows.
-
Many crates are not tested. This could be for a lot of reasons, including that the crate doesn't compile any more (e.g. used old nightly features), has broken or flaky tests, requires network access, or other reasons.
-
Before crater can be run,
@bors try
needs to succeed in building artifacts. This means that if your code doesn't compile, you cannot run crater.
Perf runs
A lot of work is put into improving the performance of the compiler and preventing performance regressions. A "perf run" is used to compare the performance of the compiler in different configurations for a large collection of popular crates. Different configurations include "fresh builds", builds with incremental compilation, etc.
The result of a perf run is a comparison between two versions of the compiler (by their commit hashes).
You should request a perf run if your PR may affect performance, especially if it can affect performance adversely.
Further reading
The following blog posts may also be of interest:
- brson's classic "How Rust is tested"
Running tests
You can run the tests using x.py
. The most basic command – which
you will almost never want to use! – is as follows:
> ./x.py test
This will build the full stage 2 compiler and then run the whole test suite. You probably don't want to do this very often, because it takes a very long time, and anyway bors / travis will do it for you. (Often, I will run this command in the background after opening a PR that I think is done, but rarely otherwise. -nmatsakis)
The test results are cached and previously successful tests are
ignored
during testing. The stdout/stderr contents as well as a
timestamp file for every test can be found under build/ARCH/test/
.
To force-rerun a test (e.g. in case the test runner fails to notice
a change) you can simply remove the timestamp file.
Note that some tests require a Python-enabled gdb. You can test if
your gdb install supports Python by using the python
command from
within gdb. Once invoked you can type some Python code (e.g.
print("hi")
) followed by return and then CTRL+D
to execute it.
If you are building gdb from source, you will need to configure with
--with-python=<path-to-python-binary>
.
Running a subset of the test suites
When working on a specific PR, you will usually want to run a smaller set of tests, and with a stage 1 build. For example, a good "smoke test" that can be used after modifying rustc to see if things are generally working correctly would be the following:
> ./x.py test --stage 1 src/test/{ui,compile-fail,run-pass}
This will run the ui
, compile-fail
, and run-pass
test suites,
and only with the stage 1 build. Of course, the choice of test suites
is somewhat arbitrary, and may not suit the task you are doing. For
example, if you are hacking on debuginfo, you may be better off with
the debuginfo test suite:
> ./x.py test --stage 1 src/test/debuginfo
If you only need to test a specific subdirectory of tests for any
given test suite, you can pass that directory to x.py test
:
> ./x.py test --stage 1 src/test/ui/const-generics
Likewise, you can test a single file by passing its path:
> ./x.py test --stage 1 src/test/ui/const-generics/const-test.rs
Run only the tidy script
> ./x.py test src/tools/tidy
Run tests on the standard library
> ./x.py test src/libstd
Run tests on the standard library and run the tidy script
> ./x.py test src/libstd src/tools/tidy
Run tests on the standard library using a stage 1 compiler
> ./x.py test src/libstd --stage 1
By listing which test suites you want to run you avoid having to run tests for components you did not change at all.
Warning: Note that bors only runs the tests with the full stage 2 build; therefore, while the tests usually work fine with stage 1, there are some limitations. In particular, the stage1 compiler doesn't work well with procedural macros or custom derive tests.
Running an individual test
Another common thing that people want to do is to run an individual
test, often the test they are trying to fix. As mentioned earlier,
you may pass the full file path to achieve this, or alternatively one
may invoke x.py
with the --test-args
option:
> ./x.py test --stage 1 src/test/ui --test-args issue-1234
Under the hood, the test runner invokes the standard rust test runner
(the same one you get with #[test]
), so this command would wind up
filtering for tests that include "issue-1234" in the name. (Thus
--test-args
is a good way to run a collection of related tests.)
Using incremental compilation
You can further enable the --incremental
flag to save additional
time in subsequent rebuilds:
> ./x.py test --stage 1 src/test/ui --incremental --test-args issue-1234
If you don't want to include the flag with every command, you can
enable it in the config.toml
, too:
# Whether to always use incremental compilation when building rustc
incremental = true
Note that incremental compilation will use more disk space than usual.
If disk space is a concern for you, you might want to check the size
of the build
directory from time to time.
Running tests manually
Sometimes it's easier and faster to just run the test by hand. Most tests are
just rs
files, so you can do something like
> rustc +stage1 src/test/ui/issue-1234.rs
This is much faster, but doesn't always work. For example, some tests include directives that specify specific compiler flags, or which rely on other crates, and they may not run the same without those options.
Adding new tests
In general, we expect every PR that fixes a bug in rustc to come accompanied by a regression test of some kind. This test should fail in master but pass after the PR. These tests are really useful for preventing us from repeating the mistakes of the past.
To add a new test, the first thing you generally do is to create a file, typically a Rust source file. Test files have a particular structure:
- They should have some kind of comment explaining what the test is about;
- next, they can have one or more header commands, which are special comments that the test interpreter knows how to interpret.
- finally, they have the Rust source. This may have various error annotations which indicate expected compilation errors or warnings.
Depending on the test suite, there may be some other details to be aware of:
- For the
ui
test suite, you need to generate reference output files.
What kind of test should I add?
It can be difficult to know what kind of test to use. Here are some rough heuristics:
- Some tests have specialized needs:
- need to run gdb or lldb? use the
debuginfo
test suite - need to inspect LLVM IR or MIR IR? use the
codegen
ormir-opt
test suites - need to run rustdoc? Prefer a
rustdoc
test - need to inspect the resulting binary in some way? Then use
run-make
- need to run gdb or lldb? use the
- For most other things, a
ui
(orui-fulldeps
) test is to be preferred:ui
tests subsume both run-pass, compile-fail, and parse-fail tests- in the case of warnings or errors,
ui
tests capture the full output, which makes it easier to review but also helps prevent "hidden" regressions in the output
Naming your test
We have not traditionally had a lot of structure in the names of
tests. Moreover, for a long time, the rustc test runner did not
support subdirectories (it now does), so test suites like
src/test/run-pass
have a huge mess of files in them. This is not
considered an ideal setup.
For regression tests – basically, some random snippet of code that
came in from the internet – we often just name the test after the
issue. For example, src/test/run-pass/issue-12345.rs
. If possible,
though, it is better if you can put the test into a directory that
helps identify what piece of code is being tested here (e.g.,
borrowck/issue-12345.rs
is much better), or perhaps give it a more
meaningful name. Still, do include the issue number somewhere.
When writing a new feature, create a subdirectory to store your tests. For example, if you are implementing RFC 1234 ("Widgets"), then it might make sense to put the tests in directories like:
src/test/ui/rfc1234-widgets/
src/test/run-pass/rfc1234-widgets/
- etc
In other cases, there may already be a suitable directory. (The proper directory structure to use is actually an area of active debate.)
Comment explaining what the test is about
When you create a test file, include a comment summarizing the point of the test at the start of the file. This should highlight which parts of the test are more important, and what the bug was that the test is fixing. Citing an issue number is often very helpful.
This comment doesn't have to be super extensive. Just something like "Regression test for #18060: match arms were matching in the wrong order." might already be enough.
These comments are very useful to others later on when your test breaks, since they often can highlight what the problem is. They are also useful if for some reason the tests need to be refactored, since they let others know which parts of the test were important (often a test must be rewritten because it no longer tests what is was meant to test, and then it's useful to know what it was meant to test exactly).
Header commands: configuring rustc
Header commands are special comments that the test runner knows how to
interpret. They must appear before the Rust source in the test. They
are normally put after the short comment that explains the point of
this test. For example, this test uses the // compile-flags
command
to specify a custom flag to give to rustc when the test is compiled:
// Test the behavior of `0 - 1` when overflow checks are disabled.
// compile-flags: -Coverflow-checks=off
fn main() {
let x = 0 - 1;
...
}
Ignoring tests
These are used to ignore the test in some situations, which means the test won't be compiled or run.
ignore-X
whereX
is a target detail or stage will ignore the test accordingly (see below)only-X
is likeignore-X
, but will only run the test on that target or stageignore-pretty
will not compile the pretty-printed test (this is done to test the pretty-printer, but might not always work)ignore-test
always ignores the testignore-lldb
andignore-gdb
will skip a debuginfo test on that debugger.ignore-gdb-version
can be used to ignore the test when certain gdb versions are used
Some examples of X
in ignore-X
:
- Architecture:
aarch64
,arm
,asmjs
,mips
,wasm32
,x86_64
,x86
, ... - OS:
android
,emscripten
,freebsd
,ios
,linux
,macos
,windows
, ... - Environment (fourth word of the target triple):
gnu
,msvc
,musl
. - Pointer width:
32bit
,64bit
. - Stage:
stage0
,stage1
,stage2
.
Other Header Commands
Here is a list of other header commands. This list is not
exhaustive. Header commands can generally be found by browsing the
TestProps
structure found in header.rs
from the compiletest
source.
run-rustfix
for UI tests, indicates that the test produces structured suggestions. The test writer should create a.fixed
file, which contains the source with the suggestions applied. When the test is run, compiletest first checks that the correct lint/warning is generated. Then, it applies the suggestion and compares against.fixed
(they must match). Finally, the fixed source is compiled, and this compilation is required to succeed. The.fixed
file can also be generated automatically with the--bless
option, discussed below.min-gdb-version
specifies the minimum gdb version required for this test; see alsoignore-gdb-version
min-lldb-version
specifies the minimum lldb version required for this testrust-lldb
causes the lldb part of the test to only be run if the lldb in use contains the Rust pluginno-system-llvm
causes the test to be ignored if the system llvm is usedmin-llvm-version
specifies the minimum llvm version required for this testmin-system-llvm-version
specifies the minimum system llvm version required for this test; the test is ignored if the system llvm is in use and it doesn't meet the minimum version. This is useful when an llvm feature has been backported to rust-llvmignore-llvm-version
can be used to skip the test when certain LLVM versions are used. This takes one or two arguments; the first argument is the first version to ignore. If no second argument is given, all subsequent versions are ignored; otherwise, the second argument is the last version to ignore.compile-pass
for UI tests, indicates that the test is supposed to compile, as opposed to the default where the test is supposed to error out.compile-flags
passes extra command-line args to the compiler, e.g.compile-flags -g
which forces debuginfo to be enabled.should-fail
indicates that the test should fail; used for "meta testing", where we test the compiletest program itself to check that it will generate errors in appropriate scenarios. This header is ignored for pretty-printer tests.gate-test-X
whereX
is a feature marks the test as "gate test" for feature X. Such tests are supposed to ensure that the compiler errors when usage of a gated feature is attempted without the proper#![feature(X)]
tag. Each unstable lang feature is required to have a gate test.
Error annotations
Error annotations specify the errors that the compiler is expected to emit. They are "attached" to the line in source where the error is located.
~
: Associates the following error level and message with the current line~|
: Associates the following error level and message with the same line as the previous comment~^
: Associates the following error level and message with the previous line. Each caret (^
) that you add adds a line to this, so~^^^^^^^
is seven lines up.
The error levels that you can have are:
ERROR
WARNING
NOTE
HELP
andSUGGESTION
*
* Note: SUGGESTION
must follow immediately after HELP
.
Revisions
Certain classes of tests support "revisions" (as of the time of this writing, this includes run-pass, compile-fail, run-fail, and incremental, though incremental tests are somewhat different). Revisions allow a single test file to be used for multiple tests. This is done by adding a special header at the top of the file:
# #![allow(unused_variables)] #fn main() { // revisions: foo bar baz #}
This will result in the test being compiled (and tested) three times,
once with --cfg foo
, once with --cfg bar
, and once with --cfg baz
. You can therefore use #[cfg(foo)]
etc within the test to tweak
each of these results.
You can also customize headers and expected error messages to a particular
revision. To do this, add [foo]
(or bar
, baz
, etc) after the //
comment, like so:
# #![allow(unused_variables)] #fn main() { // A flag to pass in only for cfg `foo`: //[foo]compile-flags: -Z verbose #[cfg(foo)] fn test_foo() { let x: usize = 32_u32; //[foo]~ ERROR mismatched types } #}
Note that not all headers have meaning when customized to a revision.
For example, the ignore-test
header (and all "ignore" headers)
currently only apply to the test as a whole, not to particular
revisions. The only headers that are intended to really work when
customized to a revision are error patterns and compiler flags.
Guide to the UI tests
The UI tests are intended to capture the compiler's complete output,
so that we can test all aspects of the presentation. They work by
compiling a file (e.g., ui/hello_world/main.rs
),
capturing the output, and then applying some normalization (see
below). This normalized result is then compared against reference
files named ui/hello_world/main.stderr
and
ui/hello_world/main.stdout
. If either of those files doesn't exist,
the output must be empty (that is actually the case for
this particular test). If the test run fails, we will print out
the current output, but it is also saved in
build/<target-triple>/test/ui/hello_world/main.stdout
(this path is
printed as part of the test failure message), so you can run diff
and so forth.
Tests that do not result in compile errors
By default, a UI test is expected not to compile (in which case,
it should contain at least one //~ ERROR
annotation). However, you
can also make UI tests where compilation is expected to succeed, and
you can even run the resulting program. Just add one of the following
header commands:
// compile-pass
– compilation should succeed but do not run the resulting binary// run-pass
– compilation should succeed and we should run the resulting binary
Editing and updating the reference files
If you have changed the compiler's output intentionally, or you are
making a new test, you can pass --bless
to the test subcommand. E.g.
if some tests in src/test/ui
are failing, you can run
./x.py test --stage 1 src/test/ui --bless
to automatically adjust the .stderr
, .stdout
or .fixed
files of
all tests. Of course you can also target just specific tests with the
--test-args your_test_name
flag, just like when running the tests.
Normalization
The normalization applied is aimed at eliminating output difference between platforms, mainly about filenames:
- the test directory is replaced with
$DIR
- all backslashes (
) are converted to forward slashes (
/
) (for Windows) - all CR LF newlines are converted to LF
Sometimes these built-in normalizations are not enough. In such cases, you may provide custom normalization rules using the header commands, e.g.
# #![allow(unused_variables)] #fn main() { // normalize-stdout-test: "foo" -> "bar" // normalize-stderr-32bit: "fn\(\) \(32 bits\)" -> "fn\(\) \($$PTR bits\)" // normalize-stderr-64bit: "fn\(\) \(64 bits\)" -> "fn\(\) \($$PTR bits\)" #}
This tells the test, on 32-bit platforms, whenever the compiler writes
fn() (32 bits)
to stderr, it should be normalized to read fn() ($PTR bits)
instead. Similar for 64-bit. The replacement is performed by regexes using
default regex flavor provided by regex
crate.
The corresponding reference file will use the normalized output to test both 32-bit and 64-bit platforms:
...
|
= note: source type: fn() ($PTR bits)
= note: target type: u16 (16 bits)
...
Please see ui/transmute/main.rs
and main.stderr
for a
concrete usage example.
Besides normalize-stderr-32bit
and -64bit
, one may use any target
information or stage supported by ignore-X
here as well (e.g.
normalize-stderr-windows
or simply normalize-stderr-test
for unconditional
replacement).
compiletest
Introduction
compiletest
is the main test harness of the Rust test suite. It allows
test authors to organize large numbers of tests (the Rust compiler has many
thousands), efficient test execution (parallel execution is supported), and
allows the test author to configure behavior and expected results of both
individual and groups of tests.
compiletest
tests may check test code for success, for failure or in some
cases, even failure to compile. Tests are typically organized as a Rust source
file with annotations in comments before and/or within the test code, which
serve to direct compiletest
on if or how to run the test, what behavior to
expect, and more. If you are unfamiliar with the compiler testing framework,
see this chapter for additional background.
The tests themselves are typically (but not always) organized into
"suites" – for example, run-pass
, a folder representing tests that should
succeed, run-fail
, a folder holding tests that should compile successfully,
but return a failure (non-zero status), compile-fail
, a folder holding tests
that should fail to compile, and many more. The various suites are defined in
src/tools/compiletest/src/common.rs in the pub struct Config
declaration. And a very good introduction to the different suites of compiler
tests along with details about them can be found in Adding new
tests.
Adding a new test file
Briefly, simply create your new test in the appropriate location under
src/test. No registration of test files is necessary as compiletest
will scan the src/test subfolder recursively, and will execute any Rust
source files it finds as tests. See Adding new tests
for a complete guide on how to adding new tests.
Header Commands
Source file annotations which appear in comments near the top of the source
file before any test code are known as header commands. These commands can
instruct compiletest
to ignore this test, set expectations on whether it is
expected to succeed at compiling, or what the test's return code is expected to
be. Header commands (and their inline counterparts, Error Info commands) are
described more fully
here.
Adding a new header command
Header commands are defined in the TestProps
struct in
src/tools/compiletest/src/header.rs. At a high level, there are
dozens of test properties defined here, all set to default values in the
TestProp
struct's impl
block. Any test can override this default value by
specifying the property in question as header command as a comment (//
) in
the test source file, before any source code.
Using a header command
Here is an example, specifying the must-compile-successfully
header command,
which takes no arguments, followed by the failure-status
header command,
which takes a single argument (which, in this case is a value of 1).
failure-status
is instructing compiletest
to expect a failure status of 1
(rather than the current Rust default of 101 at the time of this writing). The
header command and the argument list (if present) are typically separated by a
colon:
// must-compile-successfully
// failure-status: 1
#![feature(termination_trait)]
use std::io::{Error, ErrorKind};
fn main() -> Result<(), Box<Error>> {
Err(Box::new(Error::new(ErrorKind::Other, "returned Box<Error> from main()")))
}
Adding a new header command property
One would add a new header command if there is a need to define some test property or behavior on an individual, test-by-test basis. A header command property serves as the header command's backing store (holds the command's current value) at runtime.
To add a new header command property:
1. Look for the pub struct TestProps
declaration in
src/tools/compiletest/src/header.rs and add the new public
property to the end of the declaration.
2. Look for the impl TestProps
implementation block immediately following
the struct declaration and initialize the new property to its default
value.
Adding a new header command parser
When compiletest
encounters a test file, it parses the file a line at a time
by calling every parser defined in the Config
struct's implementation block,
also in src/tools/compiletest/src/header.rs (note the Config
struct's declaration block is found in
src/tools/compiletest/src/common.rs. TestProps
's load_from()
method will try passing the current line of text to each parser, which, in turn
typically checks to see if the line begins with a particular commented (//
)
header command such as // must-compile-successfully
or // failure-status
.
Whitespace after the comment marker is optional.
Parsers will override a given header command property's default value merely by being specified in the test file as a header command or by having a parameter value specified in the test file, depending on the header command.
Parsers defined in impl Config
are typically named parse_<header_command>
(note kebab-case <header-command>
transformed to snake-case
<header_command>
). impl Config
also defines several 'low-level' parsers
which make it simple to parse common patterns like simple presence or not
(parse_name_directive()
), header-command:parameter(s)
(parse_name_value_directive()
), optional parsing only if a particular cfg
attribute is defined (has_cfg_prefix()
) and many more. The low-level parsers
are found near the end of the impl Config
block; be sure to look through them
and their associated parsers immediately above to see how they are used to
avoid writing additional parsing code unnecessarily.
As a concrete example, here is the implementation for the
parse_failure_status()
parser, in
src/tools/compiletest/src/header.rs:
@@ -232,6 +232,7 @@ pub struct TestProps {
// customized normalization rules
pub normalize_stdout: Vec<(String, String)>,
pub normalize_stderr: Vec<(String, String)>,
+ pub failure_status: i32,
}
impl TestProps {
@@ -260,6 +261,7 @@ impl TestProps {
run_pass: false,
normalize_stdout: vec![],
normalize_stderr: vec![],
+ failure_status: 101,
}
}
@@ -383,6 +385,10 @@ impl TestProps {
if let Some(rule) = config.parse_custom_normalization(ln, "normalize-stderr") {
self.normalize_stderr.push(rule);
}
+
+ if let Some(code) = config.parse_failure_status(ln) {
+ self.failure_status = code;
+ }
});
for key in &["RUST_TEST_NOCAPTURE", "RUST_TEST_THREADS"] {
@@ -488,6 +494,13 @@ impl Config {
self.parse_name_directive(line, "pretty-compare-only")
}
+ fn parse_failure_status(&self, line: &str) -> Option<i32> {
+ match self.parse_name_value_directive(line, "failure-status") {
+ Some(code) => code.trim().parse::<i32>().ok(),
+ _ => None,
+ }
+ }
Implementing the behavior change
When a test invokes a particular header command, it is expected that some
behavior will change as a result. What behavior, obviously, will depend on the
purpose of the header command. In the case of failure-status
, the behavior
that changes is that compiletest
expects the failure code defined by the
header command invoked in the test, rather than the default value.
Although specific to failure-status
(as every header command will have a
different implementation in order to invoke behavior change) perhaps it is
helpful to see the behavior change implementation of one case, simply as an
example. To implement failure-status
, the check_correct_failure_status()
function found in the TestCx
implementation block, located in
src/tools/compiletest/src/runtest.rs,
was modified as per below:
@@ -295,11 +295,14 @@ impl<'test> TestCx<'test> {
}
fn check_correct_failure_status(&self, proc_res: &ProcRes) {
- // The value the rust runtime returns on failure
- const RUST_ERR: i32 = 101;
- if proc_res.status.code() != Some(RUST_ERR) {
+ let expected_status = Some(self.props.failure_status);
+ let received_status = proc_res.status.code();
+
+ if expected_status != received_status {
self.fatal_proc_rec(
- &format!("failure produced the wrong error: {}", proc_res.status),
+ &format!("Error: expected failure status ({:?}) but received status {:?}.",
+ expected_status,
+ received_status),
proc_res,
);
}
@@ -320,7 +323,6 @@ impl<'test> TestCx<'test> {
);
let proc_res = self.exec_compiled_test();
-
if !proc_res.status.success() {
self.fatal_proc_rec("test run failed!", &proc_res);
}
@@ -499,7 +501,6 @@ impl<'test> TestCx<'test> {
expected,
actual
);
- panic!();
}
}
Note the use of self.props.failure_status
to access the header command
property. In tests which do not specify the failure status header command,
self.props.failure_status
will evaluate to the default value of 101 at the
time of this writing. But for a test which specifies a header command of, for
example, // failure-status: 1
, self.props.failure_status
will evaluate to
1, as parse_failure_status()
will have overridden the TestProps
default
value, for that test specifically.
Walkthrough: a typical contribution
There are a lot of ways to contribute to the rust compiler, including fixing bugs, improving performance, helping design features, providing feedback on existing features, etc. This chapter does not claim to scratch the surface. Instead, it walks through the design and implementation of a new feature. Not all of the steps and processes described here are needed for every contribution, and I will try to point those out as they arise.
In general, if you are interested in making a contribution and aren't sure where to start, please feel free to ask!
Overview
The feature I will discuss in this chapter is the ?
Kleene operator for
macros. Basically, we want to be able to write something like this:
macro_rules! foo {
($arg:ident $(, $optional_arg:ident)?) => {
println!("{}", $arg);
$(
println!("{}", $optional_arg);
)?
}
}
fn main() {
let x = 0;
foo!(x); // ok! prints "0"
foo!(x, x); // ok! prints "0 0"
}
So basically, the $(pat)?
matcher in the macro means "this pattern can occur
0 or 1 times", similar to other regex syntaxes.
There were a number of steps to go from an idea to stable rust feature. Here is a quick list. We will go through each of these in order below. As I mentioned before, not all of these are needed for every type of contribution.
- Idea discussion/Pre-RFC A Pre-RFC is an early draft or design discussion of a feature. This stage is intended to flesh out the design space a bit and get a grasp on the different merits and problems with an idea. It's a great way to get early feedback on your idea before presenting it the wider audience. You can find the original discussion here.
- RFC This is when you formally present your idea to the community for consideration. You can find the RFC here.
- Implementation Implement your idea unstabley in the compiler. You can find the original implementation here.
- Possibly iterate/refine As the community gets experience with your
feature on the nightly compiler and in
libstd
, there may be additional feedback about design choice that might be adjusted. This particular feature went through a number of iterations. - Stabilization When your feature has baked enough, a rust team member may propose to stabilize it. If there is consensus, this is done.
- Relax Your feature is now a stable rust feature!
Pre-RFC and RFC
NOTE: In general, if you are not proposing a new feature or substantial change to rust or the ecosystem, you don't need to follow the RFC process. Instead, you can just jump to implementation.
You can find the official guidelines for when to open an RFC here.
An RFC is a document that describes the feature or change you are proposing in detail. Anyone can write an RFC; the process is the same for everyone, including rust team members.
To open an RFC, open a PR on the rust-lang/rfcs repo on GitHub. You can find detailed instructions in the README.
Before opening an RFC, you should do the research to "flesh out" your idea. Hastily-proposed RFCs tend not to be accepted. You should generally have a good description of the motivation, impact, disadvantages, and potential interactions with other features.
If that sounds like a lot of work, it's because it is. But no fear! Even if you're not a compiler hacker, you can get great feedback by doing a pre-RFC. This is an informal discussion of the idea. The best place to do this is internals.rust-lang.org. Your post doesn't have to follow any particular structure. It doesn't even need to be a cohesive idea. Generally, you will get tons of feedback that you can integrate back to produce a good RFC.
(Another pro-tip: try searching the RFCs repo and internals for prior related ideas. A lot of times an idea has already been considered and was either rejected or postponed to be tried again later. This can save you and everybody else some time)
In the case of our example, a participant in the pre-RFC thread pointed out a syntax ambiguity and a potential resolution. Also, the overall feedback seemed positive. In this case, the discussion converged pretty quickly, but for some ideas, a lot more discussion can happen (e.g. see this RFC which received a whopping 684 comments!). If that happens, don't be discouraged; it means the community is interested in your idea, but it perhaps needs some adjustments.
The RFC for our ?
macro feature did receive some discussion on the RFC thread
too. As with most RFCs, there were a few questions that we couldn't answer by
discussion: we needed experience using the feature to decide. Such questions
are listed in the "Unresolved Questions" section of the RFC. Also, over the
course of the RFC discussion, you will probably want to update the RFC document
itself to reflect the course of the discussion (e.g. new alternatives or prior
work may be added or you may decide to change parts of the proposal itself).
In the end, when the discussion seems to reach a consensus and die down a bit, a rust team member may propose to move to FCP with one of three possible dispositions. This means that they want the other members of the appropriate teams to review and comment on the RFC. More discussion may ensue, which may result in more changes or unresolved questions being added. At some point, when everyone is satisfied, the RFC enters the "final comment period" (FCP), which is the last chance for people to bring up objections. When the FCP is over, the disposition is adopted. Here are the three possible dispositions:
- Merge: accept the feature. Here is the proposal to merge for our
?
macro feature. - Close: this feature in its current form is not a good fit for rust. Don't be discouraged if this happens to your RFC, and don't take it personally. This is not a reflection on you, but rather a community decision that rust will go a different direction.
- Postpone: there is interest in going this direction but not at the moment. This happens most often because the appropriate rust team doesn't have the bandwidth to shepherd the feature through the process to stabilization. Often this is the case when the feature doesn't fit into the team's roadmap. Postponed ideas may be revisited later.
When an RFC is merged, the PR is merged into the RFCs repo. A new tracking
issue is created in the rust-lang/rust repo to track progress on the feature
and discuss unresolved questions, implementation progress and blockers, etc.
Here is the tracking issue on for our ?
macro feature.
Implementation
To make a change to the compiler, open a PR against the rust-lang/rust repo.
Depending on the feature/change/bug fix/improvement, implementation may be relatively-straightforward or it may be a major undertaking. You can always ask for help or mentorship from more experienced compiler devs. Also, you don't have to be the one to implement your feature; but keep in mind that if you don't it might be a while before someone else does.
For the ?
macro feature, I needed to go understand the relevant parts of
macro expansion in the compiler. Personally, I find that improving the
comments in the code is a helpful way of making sure I understand
it, but you don't have to do that if you don't want to.
I then implemented the original feature, as described in the RFC. When
a new feature is implemented, it goes behind a feature gate, which means that
you have to use #![feature(my_feature_name)]
to use the feature. The feature
gate is removed when the feature is stabilized.
Most bug fixes and improvements don't require a feature gate. You can just make your changes/improvements.
When you open a PR on the rust-lang/rust, a bot will assign your PR to a
review. If there is a particular rust team member you are working with, you can
request that reviewer by leaving a comment on the thread with r? @reviewer-github-id
(e.g. r? @eddyb
). If you don't know who to request,
don't request anyone; the bot will assign someone automatically.
The reviewer may request changes before they approve your PR. Feel free to ask questions or discuss things you don't understand or disagree with. However, recognize that the PR won't be merged unless someone on the rust team approves it.
When your review approves the PR, it will go into a queue for yet another bot
called @bors
. @bors
manages the CI build/merge queue. When your PR reaches
the head of the @bors
queue, @bors
will test out the merge by running all
tests against your PR on Travis CI. This takes about 2 hours as of this
writing. If all tests pass, the PR is merged and becomes part of the next
nightly compiler!
There are a couple of things that may happen for some PRs during the review process
- If the change is substantial enough, the reviewer may request an FCP on the PR. This gives all members of the appropriate team a chance to review the changes.
- If the change may cause breakage, the reviewer may request a crater run. This compiles the compiler with your changes and then attempts to compile all crates on crates.io with your modified compiler. This is a great smoke test to check if you introduced a change to compiler behavior that affects a large portion of the ecosystem.
- If the diff of your PR is large or the reviewer is busy, your PR may have some merge conflicts with other PRs that happen to get merged first. You should fix these merge conflicts using the normal git procedures.
If you are not doing a new feature or something like that (e.g. if you are fixing a bug), then that's it! Thanks for your contribution :)
Refining your implementation
As people get experience with your new feature on nightly, slight changes may
be proposed and unresolved questions may become resolved. Updates/changes go
through the same process for implementing any other changes, as described
above (i.e. submit a PR, go through review, wait for @bors
, etc).
Some changes may be major enough to require an FCP and some review by rust team members.
For the ?
macro feature, we went through a few different iterations after the
original implementation: 1, 2, 3.
Along the way, we decided that ?
should not take a separator, which was
previously an unresolved question listed in the RFC. We also changed the
disambiguation strategy: we decided to remove the ability to use ?
as a
separator token for other repetition operators (e.g. +
or *
). However,
since this was a breaking change, we decided to do it over an edition boundary.
Thus, the new feature can be enabled only in edition 2018. These deviations
from the original RFC required another
FCP.
Stabilization
Finally, after the feature had baked for a while on nightly, a language team member moved to stabilize it.
A stabilization report needs to be written that includes
- brief description of the behavior and any deviations from the RFC
- which edition(s) are affected and how
- links to a few tests to show the interesting aspects
The stabilization report for our feature is here.
After this, a PR is made to remove the feature gate, enabling the feature by default (on the 2018 edition). A note is added to the Release notes about the feature.
Steps to stabilize the feature can be found at Stabilizing Features.
Implement New Feature
When you want to implement a new significant feature in the compiler, you need to go through this process to make sure everything goes smoothly.
The @rfcbot (p)FCP process
When the change is small and uncontroversial, then it can be done with just writing a PR and getting r+ from someone who knows that part of the code. However, if the change is potentially controversial, it would be a bad idea to push it without consensus from the rest of the team (both in the "distributed system" sense to make sure you don't break anything you don't know about, and in the social sense to avoid PR fights).
If such a change seems to be too small to require a full formal RFC process (e.g. a big refactoring of the code, or a "technically-breaking" change, or a "big bugfix" that basically amounts to a small feature) but is still too controversial or big to get by with a single r+, you can start a pFCP (or, if you don't have r+ rights, ask someone who has them to start one - and unless they have a concern themselves, they should).
Again, the pFCP process is only needed if you need consensus - if you don't think anyone would have a problem with your change, it's ok to get by with only an r+. For example, it is OK to add or modify unstable command-line flags or attributes without an pFCP for compiler development or standard library use, as long as you don't expect them to be in wide use in the nightly ecosystem.
You don't need to have the implementation fully ready for r+ to ask for a pFCP, but it is generally a good idea to have at least a proof of concept so that people can see what you are talking about.
That starts a "proposed final comment period" (pFCP), which requires all members of the team to sign off the FCP. After they all do so, there's a week long "final comment period" where everybody can comment, and if no new concerns are raised, the PR/issue gets FCP approval.
The logistics of writing features
There are a few "logistic" hoops you might need to go through in order to implement a feature in a working way.
Warning Cycles
In some cases, a feature or bugfix might break some existing programs in some edge cases. In that case, you might want to do a crater run to assess the impact and possibly add a future-compatibility lint, similar to those used for edition-gated lints.
Stability
We value the stability of Rust. Code that works and runs on stable should (mostly) not break. Because of that, we don't want to release a feature to the world with only team consensus and code review - we want to gain real-world experience on using that feature on nightly, and we might want to change the feature based on that experience.
To allow for that, we must make sure users don't accidentally depend on that new feature - otherwise, especially if experimentation takes time or is delayed and the feature takes the trains to stable, it would end up de facto stable and we'll not be able to make changes in it without breaking people's code.
The way we do that is that we make sure all new features are feature
gated - they can't be used without a enabling a feature gate
(#[feature(foo)]
), which can't be done in a stable/beta compiler.
See the stability in code section for the technical details.
Eventually, after we gain enough experience using the feature, make the necessary changes, and are satisfied, we expose it to the world using the stabilization process described here. Until then, the feature is not set in stone: every part of the feature can be changed, or the feature might be completely rewritten or removed. Features are not supposed to gain tenure by being unstable and unchanged for a year.
Tracking Issues
To keep track of the status of an unstable feature, the experience we get while using it on nightly, and of the concerns that block its stabilization, every feature-gate needs a tracking issue.
General discussions about the feature should be done on the tracking issue.
For features that have an RFC, you should use the RFC's tracking issue for the feature.
For other features, you'll have to make a tracking issue for that feature. The issue title should be "Tracking issue for YOUR FEATURE".
For tracking issues for features (as opposed to future-compat warnings), I don't think the description has to contain anything specific. Generally we put the list of items required for stabilization using a github list, e.g.
**Steps:**
- [ ] Implement the RFC (cc @rust-lang/compiler -- can anyone write
up mentoring instructions?)
- [ ] Adjust documentation ([see instructions on forge][doc-guide])
- Note: no stabilization step here.
Stability in code
The below steps needs to be followed in order to implement a new unstable feature:
-
Open a tracking issue - if you have an RFC, you can use the tracking issue for the RFC.
-
Pick a name for the feature gate (for RFCs, use the name in the RFC).
-
Add a feature gate declaration to
libsyntax/feature_gate.rs
in the activedeclare_features
block:
// description of feature
(active, $feature_name, "$current_nightly_version", Some($tracking_issue_number), $edition)
where $edition
has the type Option<Edition>
, and is typically
just None
.
For example:
// allow '|' at beginning of match arms (RFC 1925)
( active, match_beginning_vert, "1.21.0", Some(44101), None),
The current version is not actually important – the important version is when you are stabilizing a feature.
-
Prevent usage of the new feature unless the feature gate is set. You can check it in most places in the compiler using the expression
tcx.features().$feature_name
(orsess.features_untracked().$feature_name
if the tcx is unavailable)If the feature gate is not set, you should either maintain the pre-feature behavior or raise an error, depending on what makes sense.
-
Add a test to ensure the feature cannot be used without a feature gate, by creating
feature-gate-$feature_name.rs
andfeature-gate-$feature_name.stderr
files under thesrc/test/ui/feature-gates
directory. -
Add a section to the unstable book, in
src/doc/unstable-book/src/language-features/$feature_name.md
. -
Write a lots of tests for the new feature. PRs without tests will not be accepted!
-
Get your PR reviewed and land it. You have now successfully implemented a feature in Rust!
Request for stabilization
Once an unstable feature has been well-tested with no outstanding concern, anyone may push for its stabilization. It involves the following steps.
- Documentation PRs
- Write a stabilization report
- FCP
- Stabilization PR
Documentation PRs
If any documentation for this feature exists, it should be
in the Unstable Book
, located at src/doc/unstable-book
.
If it exists, the page for the feature gate should be removed.
If there was documentation there, integrating it into the existing documentation is needed.
If there wasn't documentation there, it needs to be added.
Places that may need updated documentation:
- The Reference: This must be updated, in full detail.
- The Book: This may or may not need updating, depends. If you're not sure, please open an issue on this repository and it can be discussed.
- standard library documentation: As needed. Language features
often don't need this, but if it's a feature that changes
how good examples are written, such as when
?
was added to the language, updating examples is important. - Rust by Example: As needed.
Prepare PRs to update documentations invovling this new feature for repositories mentioned above. Maintainers of these repositories will keep these PRs open until the whole stabilization process has completed. Meanwhile, we can proceed to the next step.
Write a stabilization report
Find the tracking issue of the feature, and create a short stabilization report. Essentially this would be a brief summary of the feature plus some links to test cases showing it works as expected, along with a list of edge cases that came up and and were considered. This is a minimal "due diligence" that we do before stabilizing.
The report should contain:
- A summary, showing examples (e.g. code snippets) what is enabled by this feature.
- Links to test cases in our test suite regarding this feature and describe the feature's behavior on encountering edge cases.
- Links to the documentations (the PRs we have made in the previous steps).
- Any other relevant information(Examples of such reports can be found in rust-lang/rust#44494 and rust-lang/rust#28237).
- The resolutions of any unresolved questions if the stabilization is for an RFC.
FCP
If any member of the team responsible for tracking this feature agrees with stabilizing this feature, they will start the FCP (final-comment-period) process by commenting
@rfcbot fcp merge
The rest of the team members will review the proposal. If the final decision is to stabilize, we proceed to do the actual code modification.
Stabilization PR
Once we have decided to stabilize a feature, we need to have a PR that actually makes that stabilization happen. These kinds of PRs are a great way to get involved in Rust, as they take you on a little tour through the source code.
Here is a general guide to how to stabilize a feature -- every feature is different, of course, so some features may require steps beyond what this guide talks about.
Note: Before we stabilize any feature, it's the rule that it should appear in the documentation.
Updating the feature-gate listing
There is a central listing of feature-gates in
src/libsyntax/feature_gate.rs
. Search for the declare_features!
macro. There should be an entry for the feature you are aiming
to stabilize, something like (this example is taken from
rust-lang/rust#32409:
// pub(restricted) visibilities (RFC 1422)
(active, pub_restricted, "1.9.0", Some(32409)),
The above line should be moved down to the area for "accepted"
features, declared below in a separate call to declare_features!
.
When it is done, it should look like:
// pub(restricted) visibilities (RFC 1422)
(accepted, pub_restricted, "1.31.0", Some(32409)),
// note that we changed this
Note that, the version number is updated to be the version number of the stable release where this feature will appear. This can be found by consulting the forge, which will guide you the next stable release number. You want to add 1 to that, because the code that lands today will become go into beta on that date, and then become stable after that. So, at the time of this writing, the next stable release (i.e. what is currently beta) was 1.30.0, hence I wrote 1.31.0 above.
Removing existing uses of the feature-gate
Next search for the feature string (in this case, pub_restricted
)
in the codebase to find where it appears. Change uses of
#![feature(XXX)]
from the libstd
and any rustc crates to be
#![cfg_attr(stage0, feature(XXX))]
. This includes the feature-gate
only for stage0, which is built using the current beta (this is
needed because the feature is still unstable in the current beta).
Also, remove those strings from any tests. If there are tests specifically targeting the feature-gate (i.e., testing that the feature-gate is required to use the feature, but nothing else), simply remove the test.
Do not require the feature-gate to use the feature
Most importantly, remove the code which flags an error if the
feature-gate is not present (since the feature is now considered
stable). If the feature can be detected because it employs some
new syntax, then a common place for that code to be is in the
same feature_gate.rs
. For example, you might see code like this:
gate_feature_post!(&self, pub_restricted, span,
"`pub(restricted)` syntax is experimental");
This gate_feature_post!
macro prints an error if the
pub_restricted
feature is not enabled. It is not needed
now that #[pub_restricted]
is stable.
For more subtle features, you may find code like this:
if self.tcx.sess.features.borrow().pub_restricted { /* XXX */ }
This pub_restricted
field (obviously named after the feature)
would ordinarily be false if the feature flag is not present
and true if it is. So transform the code to assume that the field
is true. In this case, that would mean removing the if
and
leaving just the /* XXX */
.
if self.tcx.sess.features.borrow().pub_restricted { /* XXX */ }
becomes
/* XXX */
if self.tcx.sess.features.borrow().pub_restricted && something { /* XXX */ }
becomes
if something { /* XXX */ }
Debugging the compiler
This chapter contains a few tips to debug the compiler. These tips aim to be useful no matter what you are working on. Some of the other chapters have advice about specific parts of the compiler (e.g. the Queries Debugging and Testing chapter or the LLVM Debugging chapter).
-Z
flags
The compiler has a bunch of -Z
flags. These are unstable flags that are only
enabled on nightly. Many of them are useful for debugging. To get a full listing
of -Z
flags, use -Z help
.
One useful flag is -Z verbose
, which generally enables printing more info that
could be useful for debugging.
Getting a backtrace
When you have an ICE (panic in the compiler), you can set
RUST_BACKTRACE=1
to get the stack trace of the panic!
like in
normal Rust programs. IIRC backtraces don't work on Mac and on MinGW,
sorry. If you have trouble or the backtraces are full of unknown
,
you might want to find some way to use Linux or MSVC on Windows.
In the default configuration, you don't have line numbers enabled, so the backtrace looks like this:
stack backtrace:
0: std::sys::imp::backtrace::tracing::imp::unwind_backtrace
1: std::sys_common::backtrace::_print
2: std::panicking::default_hook::{{closure}}
3: std::panicking::default_hook
4: std::panicking::rust_panic_with_hook
5: std::panicking::begin_panic
(~~~~ LINES REMOVED BY ME FOR BREVITY ~~~~)
32: rustc_typeck::check_crate
33: <std::thread::local::LocalKey<T>>::with
34: <std::thread::local::LocalKey<T>>::with
35: rustc::ty::context::TyCtxt::create_and_enter
36: rustc_driver::driver::compile_input
37: rustc_driver::run_compiler
If you want line numbers for the stack trace, you can enable debug = true
in
your config.toml and rebuild the compiler (debuginfo-level = 1
will also add
line numbers, but debug = true
gives full debuginfo). Then the backtrace will
look like this:
stack backtrace:
(~~~~ LINES REMOVED BY ME FOR BREVITY ~~~~)
at /home/user/rust/src/librustc_typeck/check/cast.rs:110
7: rustc_typeck::check::cast::CastCheck::check
at /home/user/rust/src/librustc_typeck/check/cast.rs:572
at /home/user/rust/src/librustc_typeck/check/cast.rs:460
at /home/user/rust/src/librustc_typeck/check/cast.rs:370
(~~~~ LINES REMOVED BY ME FOR BREVITY ~~~~)
33: rustc_driver::driver::compile_input
at /home/user/rust/src/librustc_driver/driver.rs:1010
at /home/user/rust/src/librustc_driver/driver.rs:212
34: rustc_driver::run_compiler
at /home/user/rust/src/librustc_driver/lib.rs:253
Getting a backtrace for errors
If you want to get a backtrace to the point where the compiler emits
an error message, you can pass the -Z treat-err-as-bug=n
, which
will make the compiler skip n
errors or delay_span_bug
calls and then
panic on the next one. If you leave off =n
, the compiler will assume 0
for
n
and thus panic on the first error it encounters.
This can also help when debugging delay_span_bug
calls - it will make
the first delay_span_bug
call panic, which will give you a useful backtrace.
For example:
$ cat error.rs
fn main() {
1 + ();
}
$ ./build/x86_64-unknown-linux-gnu/stage1/bin/rustc error.rs
error[E0277]: the trait bound `{integer}: std::ops::Add<()>` is not satisfied
--> error.rs:2:7
|
2 | 1 + ();
| ^ no implementation for `{integer} + ()`
|
= help: the trait `std::ops::Add<()>` is not implemented for `{integer}`
error: aborting due to previous error
$ # Now, where does the error above come from?
$ RUST_BACKTRACE=1 \
./build/x86_64-unknown-linux-gnu/stage1/bin/rustc \
error.rs \
-Z treat-err-as-bug
error[E0277]: the trait bound `{integer}: std::ops::Add<()>` is not satisfied
--> error.rs:2:7
|
2 | 1 + ();
| ^ no implementation for `{integer} + ()`
|
= help: the trait `std::ops::Add<()>` is not implemented for `{integer}`
error: internal compiler error: unexpected panic
note: the compiler unexpectedly panicked. this is a bug.
note: we would appreciate a bug report: https://github.com/rust-lang/rust/blob/master/CONTRIBUTING.md#bug-reports
note: rustc 1.24.0-dev running on x86_64-unknown-linux-gnu
note: run with `RUST_BACKTRACE=1` for a backtrace
thread 'rustc' panicked at 'encountered error with `-Z treat_err_as_bug',
/home/user/rust/src/librustc_errors/lib.rs:411:12
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose
backtrace.
stack backtrace:
(~~~ IRRELEVANT PART OF BACKTRACE REMOVED BY ME ~~~)
7: rustc::traits::error_reporting::<impl rustc::infer::InferCtxt<'a, 'gcx,
'tcx>>::report_selection_error
at /home/user/rust/src/librustc/traits/error_reporting.rs:823
8: rustc::traits::error_reporting::<impl rustc::infer::InferCtxt<'a, 'gcx,
'tcx>>::report_fulfillment_errors
at /home/user/rust/src/librustc/traits/error_reporting.rs:160
at /home/user/rust/src/librustc/traits/error_reporting.rs:112
9: rustc_typeck::check::FnCtxt::select_obligations_where_possible
at /home/user/rust/src/librustc_typeck/check/mod.rs:2192
(~~~ IRRELEVANT PART OF BACKTRACE REMOVED BY ME ~~~)
36: rustc_driver::run_compiler
at /home/user/rust/src/librustc_driver/lib.rs:253
$ # Cool, now I have a backtrace for the error
Getting logging output
These crates are used in compiler for logging:
- log
- env-logger: check the link to see the full
RUSTC_LOG
syntax
The compiler has a lot of debug!
calls, which print out logging information
at many points. These are very useful to at least narrow down the location of
a bug if not to find it entirely, or just to orient yourself as to why the
compiler is doing a particular thing.
To see the logs, you need to set the RUSTC_LOG
environment variable to
your log filter, e.g. to get the logs for a specific module, you can run the
compiler as RUSTC_LOG=module::path rustc my-file.rs
. All debug!
output will
then appear in standard error.
Note that unless you use a very strict filter, the logger will emit a lot of output, so use the most specific module(s) you can (comma-separated if multiple). It's typically a good idea to pipe standard error to a file and look at the log output with a text editor.
So to put it together.
# This puts the output of all debug calls in `librustc/traits` into
# standard error, which might fill your console backscroll.
$ RUSTC_LOG=rustc::traits rustc +local my-file.rs
# This puts the output of all debug calls in `librustc/traits` in
# `traits-log`, so you can then see it with a text editor.
$ RUSTC_LOG=rustc::traits rustc +local my-file.rs 2>traits-log
# Not recommended. This will show the output of all `debug!` calls
# in the Rust compiler, and there are a *lot* of them, so it will be
# hard to find anything.
$ RUSTC_LOG=debug rustc +local my-file.rs 2>all-log
# This will show the output of all `info!` calls in `rustc_trans`.
#
# There's an `info!` statement in `trans_instance` that outputs
# every function that is translated. This is useful to find out
# which function triggers an LLVM assertion, and this is an `info!`
# log rather than a `debug!` log so it will work on the official
# compilers.
$ RUSTC_LOG=rustc_trans=info rustc +local my-file.rs
How to keep or remove debug!
and trace!
calls from the resulting binary
While calls to error!
, warn!
and info!
are included in every build of the compiler,
calls to debug!
and trace!
are only included in the program if
debug-assertions=yes
is turned on in config.toml (it is
turned off by default), so if you don't see DEBUG
logs, especially
if you run the compiler with RUSTC_LOG=rustc rustc some.rs
and only see
INFO
logs, make sure that debug-assertions=yes
is turned on in your
config.toml.
I also think that in some cases just setting it will not trigger a rebuild,
so if you changed it and you already have a compiler built, you might
want to call x.py clean
to force one.
Logging etiquette and conventions
Because calls to debug!
are removed by default, in most cases, don't worry
about adding "unnecessary" calls to debug!
and leaving them in code you
commit - they won't slow down the performance of what we ship, and if they
helped you pinning down a bug, they will probably help someone else with a
different one.
A loosely followed convention is to use debug!("foo(...)")
at the start of
a function foo
and debug!("foo: ...")
within the function. Another
loosely followed convention is to use the {:?}
format specifier for debug
logs.
One thing to be careful of is expensive operations in logs.
If in the module rustc::foo
you have a statement
debug!("{:?}", random_operation(tcx));
Then if someone runs a debug rustc
with RUSTC_LOG=rustc::bar
, then
random_operation()
will run.
This means that you should not put anything too expensive or likely to crash there - that would annoy anyone who wants to use logging for their own module. No-one will know it until someone tries to use logging to find another bug.
Formatting Graphviz output (.dot files)
Some compiler options for debugging specific features yield graphviz graphs -
e.g. the #[rustc_mir(borrowck_graphviz_postflow="suffix.dot")]
attribute
dumps various borrow-checker dataflow graphs.
These all produce .dot
files. To view these files, install graphviz (e.g.
apt-get install graphviz
) and then run the following commands:
$ dot -T pdf maybe_init_suffix.dot > maybe_init_suffix.pdf
$ firefox maybe_init_suffix.pdf # Or your favorite pdf viewer
Narrowing (Bisecting) Regressions
The cargo-bisect-rustc tool can be used as a quick and easy way to
find exactly which PR caused a change in rustc
behavior. It automatically
downloads rustc
PR artifacts and tests them against a project you provide
until it finds the regression. You can then look at the PR to get more context
on why it was changed. See this tutorial on how to use
it.
Downloading Artifacts from Rust's CI
The rustup-toolchain-install-master tool by kennytm can be used to
download the artifacts produced by Rust's CI for a specific SHA1 -- this
basically corresponds to the successful landing of some PR -- and then sets
them up for your local use. This also works for artifacts produced by @bors try
. This is helpful when you want to examine the resulting build of a PR
without doing the build yourself.
Profiling the compiler
This discussion talks about how profile the compiler and find out
where it spends its time. If you just want to get a general overview,
it is often a good idea to just add -Zself-profile
option to the
rustc command line. This will break down time spent into various
categories. But if you want a more detailed look, you probably want
to break out a custom profiler.
Profiling with perf
This is a guide for how to profile rustc with perf.
Initial steps
- Get a clean checkout of rust-lang/master, or whatever it is you want to profile.
- Set the following settings in your
config.toml
:debuginfo-level = 1
- enables line debuginfouse-jemalloc = false
- lets you do memory use profiling with valgrind- leave everything else the defaults
- Run
./x.py build
to get a full build - Make a rustup toolchain pointing to that result
Gathering a perf profile
perf is an excellent tool on linux that can be used to gather and analyze all kinds of information. Mostly it is used to figure out where a program spends its time. It can also be used for other sorts of events, though, like cache misses and so forth.
The basics
The basic perf
command is this:
> perf record -F99 --call-graph dwarf XXX
The -F99
tells perf to sample at 99 Hz, which avoids generating too
much data for longer runs (why 99 Hz you ask? It is often chosen
because it is unlikely to be in lockstep with other periodic
activity). The --call-graph dwarf
tells perf to get call-graph
information from debuginfo, which is accurate. The XXX
is the
command you want to profile. So, for example, you might do:
> perf record -F99 --call-graph dwarf cargo +<toolchain> rustc
to run cargo
-- here <toolchain>
should be the name of the toolchain
you made in the beginning. But there are some things to be aware of:
- You probably don't want to profile the time spend building
dependencies. So something like
cargo build; cargo clean -p $C
may be helpful (where$C
is the crate name)- Though usually I just do
touch src/lib.rs
and rebuild instead. =)
- Though usually I just do
- You probably don't want incremental messing about with your
profile. So something like
CARGO_INCREMENTAL=0
can be helpful.
Gathering a perf profile from a perf.rust-lang.org
test
Often we want to analyze a specific test from perf.rust-lang.org
. To
do that, the first step is to clone
the rustc-perf repository:
> git clone https://github.com/rust-lang-nursery/rustc-perf
Doing it the easy way
Once you've cloned the repo, you can use the collector
executable to
do profiling for you! You can find
instructions in the rustc-perf readme.
For example, to measure the clap-rs test, you might do:
> ./target/release/collector
--output-repo /path/to/place/output
profile perf-record
--rustc /path/to/rustc/executable/from/your/build/directory
--cargo `which cargo`
--filter clap-rs
--builds Check
You can also use that same command to use cachegrind or other profiling tools.
Doing it the hard way
If you prefer to run things manually, that is also possible. You first
need to find the source for the test you want. Sources for the tests
are found in the collector/benchmarks
directory. So let's go
into the directory of a specific test; we'll use clap-rs
as an
example:
> cd collector/benchmarks/clap-rs
In this case, let's say we want to profile the cargo check
performance. In that case, I would first run some basic commands to
build the dependencies:
# Setup: first clean out any old results and build the dependencies:
> cargo +<toolchain> clean
> CARGO_INCREMENTAL=0 cargo +<toolchain> check
(Again, <toolchain>
should be replaced with the name of the
toolchain we made in the first step.)
Next: we want record the execution time for just the clap-rs crate,
running cargo check. I tend to use cargo rustc
for this, since it
also allows me to add explicit flags, which we'll do later on.
> touch src/lib.rs
> CARGO_INCREMENTAL=0 perf record -F99 --call-graph dwarf cargo rustc --profile check --lib
Note that final command: it's a doozy! It uses the cargo rustc
command, which executes rustc with (potentially) additional options;
the --profile check
and --lib
options specify that we are doing a
cargo check
execution, and that this is a library (not a binary).
At this point, we can use perf
tooling to analyze the results. For example:
> perf report
will open up an interactive TUI program. In simple cases, that can be
helpful. For more detailed examination, the perf-focus
tool
can be helpful; it is covered below.
A note of caution. Each of the rustc-perf tests is its own special
snowflake. In particular, some of them are not libraries, in which
case you would want to do touch src/main.rs
and avoid passing
--lib
. I'm not sure how best to tell which test is which to be
honest.
Gathering NLL data
If you want to profile an NLL run, you can just pass extra options to
the cargo rustc
command, like so:
> touch src/lib.rs
> CARGO_INCREMENTAL=0 perf record -F99 --call-graph dwarf cargo rustc --profile check --lib -- -Zborrowck=mir
Analyzing a perf profile with perf focus
Once you've gathered a perf profile, we want to get some information about it. For this, I personally use perf focus. It's a kind of simple but useful tool that lets you answer queries like:
- "how much time was spent in function F" (no matter where it was called from)
- "how much time was spent in function F when it was called from G"
- "how much time was spent in function F excluding time spent in G"
- "what functions does F call and how much time does it spend in them"
To understand how it works, you have to know just a bit about
perf. Basically, perf works by sampling your process on a regular
basis (or whenever some event occurs). For each sample, perf gathers a
backtrace. perf focus
lets you write a regular expression that tests
which functions appear in that backtrace, and then tells you which
percentage of samples had a backtrace that met the regular
expression. It's probably easiest to explain by walking through how I
would analyze NLL performance.
Installing perf-focus
You can install perf-focus using cargo install
:
> cargo install perf-focus
Example: How much time is spent in MIR borrowck?
Let's say we've gathered the NLL data for a test. We'd like to know
how much time it is spending in the MIR borrow-checker. The "main"
function of the MIR borrowck is called do_mir_borrowck
, so we can do
this command:
> perf focus '{do_mir_borrowck}'
Matcher : {do_mir_borrowck}
Matches : 228
Not Matches: 542
Percentage : 29%
The '{do_mir_borrowck}'
argument is called the matcher. It
specifies the test to be applied on the backtrace. In this case, the
{X}
indicates that there must be some function on the backtrace
that meets the regular expression X
. In this case, that regex is
just the name of the function we want (in fact, it's a subset of the name;
the full name includes a bunch of other stuff, like the module
path). In this mode, perf-focus just prints out the percentage of
samples where do_mir_borrowck
was on the stack: in this case, 29%.
A note about c++filt. To get the data from perf
, perf focus
currently executes perf script
(perhaps there is a better
way...). I've sometimes found that perf script
outputs C++ mangled
names. This is annoying. You can tell by running perf script | head
yourself — if you see names like 5rustc6middle
instead of
rustc::middle
, then you have the same problem. You can solve this
by doing:
> perf script | c++filt | perf focus --from-stdin ...
This will pipe the output from perf script
through c++filt
and
should mostly convert those names into a more friendly format. The
--from-stdin
flag to perf focus
tells it to get its data from
stdin, rather than executing perf focus
. We should make this more
convenient (at worst, maybe add a c++filt
option to perf focus
, or
just always use it — it's pretty harmless).
Example: How much time does MIR borrowck spend solving traits?
Perhaps we'd like to know how much time MIR borrowck spends in the trait checker. We can ask this using a more complex regex:
> perf focus '{do_mir_borrowck}..{^rustc::traits}'
Matcher : {do_mir_borrowck},..{^rustc::traits}
Matches : 12
Not Matches: 1311
Percentage : 0%
Here we used the ..
operator to ask "how often do we have
do_mir_borrowck
on the stack and then, later, some function whose
name begins with rusc::traits
?" (basically, code in that module). It
turns out the answer is "almost never" — only 12 samples fit that
description (if you ever see no samples, that often indicates your
query is messed up).
If you're curious, you can find out exactly which samples by using the
--print-match
option. This will print out the full backtrace for
each sample. The |
at the front of the line indicates the part that
the regular expression matched.
Example: Where does MIR borrowck spend its time?
Often we want to do a more "explorational" queries. Like, we know that
MIR borrowck is 29% of the time, but where does that time get spent?
For that, the --tree-callees
option is often the best tool. You
usually also want to give --tree-min-percent
or
--tree-max-depth
. The result looks like this:
> perf focus '{do_mir_borrowck}' --tree-callees --tree-min-percent 3
Matcher : {do_mir_borrowck}
Matches : 577
Not Matches: 746
Percentage : 43%
Tree
| matched `{do_mir_borrowck}` (43% total, 0% self)
: | rustc_mir::borrow_check::nll::compute_regions (20% total, 0% self)
: : | rustc_mir::borrow_check::nll::type_check::type_check_internal (13% total, 0% self)
: : : | core::ops::function::FnOnce::call_once (5% total, 0% self)
: : : : | rustc_mir::borrow_check::nll::type_check::liveness::generate (5% total, 3% self)
: : : | <rustc_mir::borrow_check::nll::type_check::TypeVerifier<'a, 'b, 'gcx, 'tcx> as rustc::mir::visit::Visitor<'tcx>>::visit_mir (3% total, 0% self)
: | rustc::mir::visit::Visitor::visit_mir (8% total, 6% self)
: | <rustc_mir::borrow_check::MirBorrowckCtxt<'cx, 'gcx, 'tcx> as rustc_mir::dataflow::DataflowResultsConsumer<'cx, 'tcx>>::visit_statement_entry (5% total, 0% self)
: | rustc_mir::dataflow::do_dataflow (3% total, 0% self)
What happens with --tree-callees
is that
- we find each sample matching the regular expression
- we look at the code that is occurs after the regex match and try to build up a call tree
The --tree-min-percent 3
option says "only show me things that take
more than 3% of the time. Without this, the tree often gets really
noisy and includes random stuff like the innards of
malloc. --tree-max-depth
can be useful too, it just limits how many
levels we print.
For each line, we display the percent of time in that function altogether ("total") and the percent of time spent in just that function and not some callee of that function (self). Usually "total" is the more interesting number, but not always.
Relative percentages
By default, all in perf-focus are relative to the total program execution. This is useful to help you keep perspective — often as we drill down to find hot spots, we can lose sight of the fact that, in terms of overall program execution, this "hot spot" is actually not important. It also ensures that percentages between different queries are easily compared against one another.
That said, sometimes it's useful to get relative percentages, so perf focus
offers a --relative
option. In this case, the percentages are
listed only for samples that match (vs all samples). So for example we
could get our percentages relative to the borrowck itself
like so:
> perf focus '{do_mir_borrowck}' --tree-callees --relative --tree-max-depth 1 --tree-min-percent 5
Matcher : {do_mir_borrowck}
Matches : 577
Not Matches: 746
Percentage : 100%
Tree
| matched `{do_mir_borrowck}` (100% total, 0% self)
: | rustc_mir::borrow_check::nll::compute_regions (47% total, 0% self) [...]
: | rustc::mir::visit::Visitor::visit_mir (19% total, 15% self) [...]
: | <rustc_mir::borrow_check::MirBorrowckCtxt<'cx, 'gcx, 'tcx> as rustc_mir::dataflow::DataflowResultsConsumer<'cx, 'tcx>>::visit_statement_entry (13% total, 0% self) [...]
: | rustc_mir::dataflow::do_dataflow (8% total, 1% self) [...]
Here you see that compute_regions
came up as "47% total" — that
means that 47% of do_mir_borrowck
is spent in that function. Before,
we saw 20% — that's because do_mir_borrowck
itself is only 43% of
the total time (and .47 * .43 = .20
).
This file offers some tips on the coding conventions for rustc. This chapter covers formatting, coding for correctness, using crates from crates.io, and some tips on structuring your PR for easy review.
Formatting and the tidy script
rustc is slowly moving towards the Rust standard coding style;
at the moment, however, it follows a rather more chaotic style. We
do have some mandatory formatting conventions, which are automatically
enforced by a script we affectionately call the "tidy" script. The
tidy script runs automatically when you do ./x.py test
and can be run
in isolation with ./x.py test src/tools/tidy
.
Copyright notice
In the past, files begin with a copyright and license notice. Please omit this notice for new files licensed under the standard terms (dual MIT/Apache-2.0).
All of the copyright notices should be gone by now, but if you come across one in the rust-lang/rust repo, feel free to open a PR to remove it.
Line length
Lines should be at most 100 characters. It's even better if you can keep things to 80.
Ignoring the line length limit. Sometimes – in particular for tests – it can be necessary to exempt yourself from this limit. In that case, you can add a comment towards the top of the file (after the copyright notice) like so:
# #![allow(unused_variables)] #fn main() { // ignore-tidy-linelength #}
Tabs vs spaces
Prefer 4-space indent.
Coding for correctness
Beyond formatting, there are a few other tips that are worth following.
Prefer exhaustive matches
Using _
in a match is convenient, but it means that when new
variants are added to the enum, they may not get handled correctly.
Ask yourself: if a new variant were added to this enum, what's the
chance that it would want to use the _
code, versus having some
other treatment? Unless the answer is "low", then prefer an
exhaustive match. (The same advice applies to if let
and while let
, which are effectively tests for a single variant.)
Use "TODO" comments for things you don't want to forget
As a useful tool to yourself, you can insert a // TODO
comment
for something that you want to get back to before you land your PR:
fn do_something() {
if something_else {
unimplemented!(); // TODO write this
}
}
The tidy script will report an error for a // TODO
comment, so this
code would not be able to land until the TODO is fixed (or removed).
This can also be useful in a PR as a way to signal from one commit that you are leaving a bug that a later commit will fix:
if foo {
return true; // TODO wrong, but will be fixed in a later commit
}
Using crates from crates.io
It is allowed to use crates from crates.io, though external dependencies should not be added gratuitously. All such crates must have a suitably permissive license. There is an automatic check which inspects the Cargo metadata to ensure this.
How to structure your PR
How you prepare the commits in your PR can make a big difference for the reviewer. Here are some tips.
Isolate "pure refactorings" into their own commit. For example, if you rename a method, then put that rename into its own commit, along with the renames of all the uses.
More commits is usually better. If you are doing a large change, it's almost always better to break it up into smaller steps that can be independently understood. The one thing to be aware of is that if you introduce some code following one strategy, then change it dramatically (versus adding to it) in a later commit, that 'back-and-forth' can be confusing.
If you run rustfmt and the file was not already formatted, isolate that into its own commit. This is really the same as the previous rule, but it's worth highlighting. It's ok to rustfmt files, but since we do not currently run rustfmt all the time, that can introduce a lot of noise into your commit. Please isolate that into its own commit. This also makes rebases a lot less painful, since rustfmt tends to cause a lot of merge conflicts, and having those isolated into their own commit makes them easier to resolve.
No merges. We do not allow merge commits into our history, other
than those by bors. If you get a merge conflict, rebase instead via a
command like git rebase -i rust-lang/master
(presuming you use the
name rust-lang
for your remote).
Individual commits do not have to build (but it's nice). We do not require that every intermediate commit successfully builds – we only expect to be able to bisect at a PR level. However, if you can make individual commits build, that is always helpful.
Naming conventions
Apart from normal Rust style/naming conventions, there are also some specific to the compiler.
-
cx
tends to be short for "context" and is often used as a suffix. For example,tcx
is a common name for the Typing Context. -
'tcx
and'gcx
are used as the lifetime names for the Typing Context. -
Because
crate
is a keyword, if you need a variable to represent something crate-related, often the spelling is changed tokrate
.
crates.io Dependencies
The rust compiler supports building with some dependencies from crates.io
.
For example, log
and env_logger
come from crates.io
.
In general, you should avoid adding dependencies to the compiler for several reasons:
- The dependency may not be high quality or well-maintained, whereas we want the compiler to be high-quality.
- The dependency may not be using a compatible license.
- The dependency may have transitive dependencies that have one of the above problems.
TODO: what is the vetting process?
Whitelist
The tidy
tool has a whitelist of crates that are allowed. To add a
dependency that is not already in the compiler, you will need to add it to this
whitelist.
Emitting Errors and other Diagnostics
A lot of effort has been put into making rustc
have great error messages.
This chapter is about how to emit compile errors and lints from the compiler.
Span
Span
is the primary data structure in rustc
used to represent a
location in the code being compiled. Span
s are attached to most constructs in
HIR and MIR, allowing for more informative error reporting.
A Span
can be looked up in a SourceMap
to get a "snippet"
useful for displaying errors with span_to_snippet
and other
similar methods on the SourceMap
.
Error messages
The rustc_errors
crate defines most of the utilities used for
reporting errors.
Session
and ParseSess
have
methods (or fields with methods) that allow reporting errors. These methods
usually have names like span_err
or struct_span_err
or span_warn
, etc...
There are lots of them; they emit different types of "errors", such as
warnings, errors, fatal errors, suggestions, etc.
In general, there are two class of such methods: ones that emit an error
directly and ones that allow finer control over what to emit. For example,
span_err
emits the given error message at the given Span
, but
struct_span_err
instead returns a
DiagnosticBuilder
.
DiagnosticBuilder
allows you to add related notes and suggestions to an error
before emitting it by calling the emit
method. (Failing to either
emit or cancel a DiagnosticBuilder
will result in an ICE.) See the
docs for more info on what you can do.
// Get a DiagnosticBuilder. This does _not_ emit an error yet.
let mut err = sess.struct_span_err(sp, "oh no! this is an error!");
// In some cases, you might need to check if `sp` is generated by a macro to
// avoid printing weird errors about macro-generated code.
if let Ok(snippet) = sess.source_map().span_to_snippet(sp) {
// Use the snippet to generate a suggested fix
err.span_suggestion(suggestion_sp, "try using a qux here", format!("qux {}", snip));
} else {
// If we weren't able to generate a snippet, then emit a "help" message
// instead of a concrete "suggestion". In practice this is unlikely to be
// reached.
err.span_help(suggestion_sp, "you could use a qux here instead");
}
// emit the error
err.emit();
Suggestions
In addition to telling the user exactly why their code is wrong, it's
oftentimes furthermore possible to tell them how to fix it. To this end,
DiagnosticBuilder
offers a structured suggestions API, which formats code
suggestions pleasingly in the terminal, or (when the --error-format json
flag
is passed) as JSON for consumption by tools, most notably the Rust Language
Server and rustfix
.
Not all suggestions should be applied mechanically. Use the
span_suggestion
method of DiagnosticBuilder
to
make a suggestion. The last argument provides a hint to tools whether
the suggestion is mechanically applicable or not.
For example, to make our qux
suggestion machine-applicable, we would do:
let mut err = sess.struct_span_err(sp, "oh no! this is an error!");
if let Ok(snippet) = sess.source_map().span_to_snippet(sp) {
err.span_suggestion(
suggestion_sp,
"try using a qux here",
format!("qux {}", snip),
Applicability::MachineApplicable,
);
} else {
err.span_help(suggestion_sp, "you could use a qux here instead");
}
err.emit();
This might emit an error like
$ rustc mycode.rs
error[E0999]: oh no! this is an error!
--> mycode.rs:3:5
|
3 | sad()
| ^ help: try using a qux here: `qux sad()`
error: aborting due to previous error
For more information about this error, try `rustc --explain E0999`.
In some cases, like when the suggestion spans multiple lines or when there are multiple suggestions, the suggestions are displayed on their own:
error[E0999]: oh no! this is an error!
--> mycode.rs:3:5
|
3 | sad()
| ^
help: try using a qux here:
|
3 | qux sad()
| ^^^
error: aborting due to previous error
For more information about this error, try `rustc --explain E0999`.
The possible values of Applicability
are:
MachineApplicable
: Can be applied mechanically.HasPlaceholders
: Cannot be applied mechanically because it has placeholder text in the suggestions. For example, "Try adding a type: `let x: <type>`".MaybeIncorrect
: Cannot be applied mechanically because the suggestion may or may not be a good one.Unspecified
: Cannot be applied mechanically because we don't know which of the above cases it falls into.
Lints
The compiler linting infrastructure is defined in the rustc::lint
module.
Declaring a lint
The built-in compiler lints are defined in the rustc_lint
crate.
Each lint is defined as a struct
that implements the LintPass
trait
. The
trait implementation allows you to check certain syntactic constructs the
linter walks the source code. You can then choose to emit lints in a very
similar way to compile errors. Finally, you register the lint to actually get
it to be run by the compiler by using the declare_lint!
macro.
For example, the following lint checks for uses
of while true { ... }
and suggests using loop { ... }
instead.
// Declare a lint called `WHILE_TRUE`
declare_lint! {
WHILE_TRUE,
// warn-by-default
Warn,
// This string is the lint description
"suggest using `loop { }` instead of `while true { }`"
}
// Define a struct and `impl LintPass` for it.
#[derive(Copy, Clone)]
pub struct WhileTrue;
impl LintPass for WhileTrue {
fn get_lints(&self) -> LintArray {
lint_array!(WHILE_TRUE)
}
}
// LateLintPass has lots of methods. We only override the definition of
// `check_expr` for this lint because that's all we need, but you could
// override other methods for your own lint. See the rustc docs for a full
// list of methods.
impl<'a, 'tcx> LateLintPass<'a, 'tcx> for WhileTrue {
fn check_expr(&mut self, cx: &LateContext, e: &hir::Expr) {
if let hir::ExprWhile(ref cond, ..) = e.node {
if let hir::ExprLit(ref lit) = cond.node {
if let ast::LitKind::Bool(true) = lit.node {
if lit.span.ctxt() == SyntaxContext::empty() {
let msg = "denote infinite loops with `loop { ... }`";
let condition_span = cx.tcx.sess.source_map().def_span(e.span);
let mut err = cx.struct_span_lint(WHILE_TRUE, condition_span, msg);
err.span_suggestion_short(condition_span, "use `loop`", "loop".to_owned());
err.emit();
}
}
}
}
}
}
Edition-gated Lints
Sometimes we want to change the behavior of a lint in a new edition. To do this,
we just add the transition to our invocation of declare_lint!
:
declare_lint! {
pub ANONYMOUS_PARAMETERS,
Allow,
"detects anonymous parameters",
Edition::Edition2018 => Warn,
}
This makes the ANONYMOUS_PARAMETERS
lint allow-by-default in the 2015 edition
but warn-by-default in the 2018 edition.
Lints that represent an incompatibility (i.e. error) in the upcoming edition
should also be registered as FutureIncompatibilityLint
s in
register_builtins
function in rustc_lint::lib
.
Lint Groups
Lints can be turned on in groups. These groups are declared in the
register_builtins
function in rustc_lint::lib
. The
add_lint_group!
macro is used to declare a new group.
For example,
add_lint_group!(sess,
"nonstandard_style",
NON_CAMEL_CASE_TYPES,
NON_SNAKE_CASE,
NON_UPPER_CASE_GLOBALS);
This defines the nonstandard_style
group which turns on the listed lints. A
user can turn on these lints with a !#[warn(nonstandard_style)]
attribute in
the source code, or by passing -W nonstandard-style
on the command line.
Linting early in the compiler
On occasion, you may need to define a lint that runs before the linting system has been initialized (e.g. during parsing or macro expansion). This is problematic because we need to have computed lint levels to know whether we should emit a warning or an error or nothing at all.
To solve this problem, we buffer the lints until the linting system is
processed. Session
and ParseSess
both have
buffer_lint
methods that allow you to buffer a lint for later. The linting
system automatically takes care of handling buffered lints later.
Thus, to define a lint that runs early in the compilation, one defines a lint
like normal but invokes the lint with buffer_lint
.
Linting even earlier in the compiler
The parser (libsyntax
) is interesting in that it cannot have dependencies on
any of the other librustc*
crates. In particular, it cannot depend on
librustc::lint
or librustc_lint
, where all of the compiler linting
infrastructure is defined. That's troublesome!
To solve this, libsyntax
defines its own buffered lint type, which
ParseSess::buffer_lint
uses. After macro expansion, these buffered lints are
then dumped into the Session::buffered_lints
used by the rest of the compiler.
Usage for buffered lints in libsyntax
is pretty much the same as the rest of
the compiler with one exception because we cannot import the LintId
s for
lints we want to emit. Instead, the BufferedEarlyLintId
type is used. If you
are defining a new lint, you will want to add an entry to this enum. Then, add
an appropriate mapping to the body of Lint::from_parser_lint_id
.
JSON diagnostic output
The compiler accepts an --error-format json
flag to output
diagnostics as JSON objects (for the benefit of tools such as cargo fix
or the RLS). It looks like this—
$ rustc json_error_demo.rs --error-format json
{"message":"cannot add `&str` to `{integer}`","code":{"code":"E0277","explanation":"\nYou tried to use a type which doesn't implement some trait in a place which\nexpected that trait. Erroneous code example:\n\n```compile_fail,E0277\n// here we declare the Foo trait with a bar method\ntrait Foo {\n fn bar(&self);\n}\n\n// we now declare a function which takes an object implementing the Foo trait\nfn some_func<T: Foo>(foo: T) {\n foo.bar();\n}\n\nfn main() {\n // we now call the method with the i32 type, which doesn't implement\n // the Foo trait\n some_func(5i32); // error: the trait bound `i32 : Foo` is not satisfied\n}\n```\n\nIn order to fix this error, verify that the type you're using does implement\nthe trait. Example:\n\n```\ntrait Foo {\n fn bar(&self);\n}\n\nfn some_func<T: Foo>(foo: T) {\n foo.bar(); // we can now use this method since i32 implements the\n // Foo trait\n}\n\n// we implement the trait on the i32 type\nimpl Foo for i32 {\n fn bar(&self) {}\n}\n\nfn main() {\n some_func(5i32); // ok!\n}\n```\n\nOr in a generic context, an erroneous code example would look like:\n\n```compile_fail,E0277\nfn some_func<T>(foo: T) {\n println!(\"{:?}\", foo); // error: the trait `core::fmt::Debug` is not\n // implemented for the type `T`\n}\n\nfn main() {\n // We now call the method with the i32 type,\n // which *does* implement the Debug trait.\n some_func(5i32);\n}\n```\n\nNote that the error here is in the definition of the generic function: Although\nwe only call it with a parameter that does implement `Debug`, the compiler\nstill rejects the function: It must work with all possible input types. In\norder to make this example compile, we need to restrict the generic type we're\naccepting:\n\n```\nuse std::fmt;\n\n// Restrict the input type to types that implement Debug.\nfn some_func<T: fmt::Debug>(foo: T) {\n println!(\"{:?}\", foo);\n}\n\nfn main() {\n // Calling the method is still fine, as i32 implements Debug.\n some_func(5i32);\n\n // This would fail to compile now:\n // struct WithoutDebug;\n // some_func(WithoutDebug);\n}\n```\n\nRust only looks at the signature of the called function, as such it must\nalready specify all requirements that will be used for every type parameter.\n"},"level":"error","spans":[{"file_name":"json_error_demo.rs","byte_start":50,"byte_end":51,"line_start":4,"line_end":4,"column_start":7,"column_end":8,"is_primary":true,"text":[{"text":" a + b","highlight_start":7,"highlight_end":8}],"label":"no implementation for `{integer} + &str`","suggested_replacement":null,"suggestion_applicability":null,"expansion":null}],"children":[{"message":"the trait `std::ops::Add<&str>` is not implemented for `{integer}`","code":null,"level":"help","spans":[],"children":[],"rendered":null}],"rendered":"error[E0277]: cannot add `&str` to `{integer}`\n --> json_error_demo.rs:4:7\n |\n4 | a + b\n | ^ no implementation for `{integer} + &str`\n |\n = help: the trait `std::ops::Add<&str>` is not implemented for `{integer}`\n\n"}
{"message":"aborting due to previous error","code":null,"level":"error","spans":[],"children":[],"rendered":"error: aborting due to previous error\n\n"}
{"message":"For more information about this error, try `rustc --explain E0277`.","code":null,"level":"","spans":[],"children":[],"rendered":"For more information about this error, try `rustc --explain E0277`.\n"}
Note that the output is a series of lines, each of which is a JSON
object, but the series of lines taken together is, unfortunately, not
valid JSON, thwarting tools and tricks (such as piping to python3 -m json.tool
)
that require such. (One speculates that this was intentional for LSP
performance purposes, so that each line/object can be sent to RLS as
it is flushed?)
Also note the "rendered" field, which contains the "human" output as a string; this was introduced so that UI tests could both make use of the structured JSON and see the "human" output (well, sans colors) without having to compile everything twice.
The JSON emitter currently lives in libsyntax/json.rs. (But arguably it should live in librustc_errors along with the "human" emitter? It's not obvious to the present author why it wasn't moved from libsyntax to librustc_errors at the same time the "human" emitter was moved.)
The JSON emitter defines its own Diagnostic
struct
(and sub-structs) for the JSON serialization. Don't confuse this with
errors::Diagnostic
!
Part 2: How rustc works
This part of the guide describes how the compiler works. It goes through everything from high-level structure of the compiler to how each stage of compilation works.
This section should be friendly to both readers interested in the end-to-end process of compilation and readers interested in learning about a specific system they wish to contribute to. If anything is unclear, feel free to file an issue on the rustc-guide repo or contact the compiler team, as detailed in this chapter from Part 1.
High-level overview of the compiler source
Crate structure
The main Rust repository consists of a src
directory, under which
there live many crates. These crates contain the sources for the
standard library and the compiler. This document, of course, focuses
on the latter.
Rustc consists of a number of crates, including syntax
,
rustc
, rustc_back
, rustc_codegen
, rustc_driver
, and
many more. The source for each crate can be found in a directory
like src/libXXX
, where XXX
is the crate name.
(N.B. The names and divisions of these crates are not set in stone and may change over time. For the time being, we tend towards a finer-grained division to help with compilation time, though as incremental compilation improves, that may change.)
The dependency structure of these crates is roughly a diamond:
rustc_driver
/ | \
/ | \
/ | \
/ v \
rustc_codegen rustc_borrowck ... rustc_metadata
\ | /
\ | /
\ | /
\ v /
rustc
|
v
syntax
/ \
/ \
syntax_pos syntax_ext
The rustc_driver
crate, at the top of this lattice, is effectively
the "main" function for the rust compiler. It doesn't have much "real
code", but instead ties together all of the code defined in the other
crates and defines the overall flow of execution. (As we transition
more and more to the query model, however, the
"flow" of compilation is becoming less centrally defined.)
At the other extreme, the rustc
crate defines the common and
pervasive data structures that all the rest of the compiler uses
(e.g. how to represent types, traits, and the program itself). It
also contains some amount of the compiler itself, although that is
relatively limited.
Finally, all the crates in the bulge in the middle define the bulk of
the compiler – they all depend on rustc
, so that they can make use
of the various types defined there, and they export public routines
that rustc_driver
will invoke as needed (more and more, what these
crates export are "query definitions", but those are covered later
on).
Below rustc
lie various crates that make up the parser and error
reporting mechanism. For historical reasons, these crates do not have
the rustc_
prefix, but they are really just as much an internal part
of the compiler and not intended to be stable (though they do wind up
getting used by some crates in the wild; a practice we hope to
gradually phase out).
Each crate has a README.md
file that describes, at a high-level,
what it contains, and tries to give some kind of explanation (some
better than others).
The main stages of compilation
The Rust compiler is in a bit of transition right now. It used to be a purely "pass-based" compiler, where we ran a number of passes over the entire program, and each did a particular check of transformation. We are gradually replacing this pass-based code with an alternative setup based on on-demand queries. In the query-model, we work backwards, executing a query that expresses our ultimate goal (e.g. "compile this crate"). This query in turn may make other queries (e.g. "get me a list of all modules in the crate"). Those queries make other queries that ultimately bottom out in the base operations, like parsing the input, running the type-checker, and so forth. This on-demand model permits us to do exciting things like only do the minimal amount of work needed to type-check a single function. It also helps with incremental compilation. (For details on defining queries, check out the query model.)
Regardless of the general setup, the basic operations that the compiler must perform are the same. The only thing that changes is whether these operations are invoked front-to-back, or on demand. In order to compile a Rust crate, these are the general steps that we take:
- Parsing input
- this processes the
.rs
files and produces the AST ("abstract syntax tree") - the AST is defined in
src/libsyntax/ast.rs
. It is intended to match the lexical syntax of the Rust language quite closely.
- this processes the
- Name resolution, macro expansion, and configuration
- once parsing is complete, we process the AST recursively, resolving
paths and expanding macros. This same process also processes
#[cfg]
nodes, and hence may strip things out of the AST as well.
- once parsing is complete, we process the AST recursively, resolving
paths and expanding macros. This same process also processes
- Lowering to HIR
- Once name resolution completes, we convert the AST into the HIR,
or "high-level intermediate representation". The HIR is defined in
src/librustc/hir/
; that module also includes the lowering code. - The HIR is a lightly desugared variant of the AST. It is more processed than the AST and more suitable for the analyses that follow. It is not required to match the syntax of the Rust language.
- As a simple example, in the AST, we preserve the parentheses
that the user wrote, so
((1 + 2) + 3)
and1 + 2 + 3
parse into distinct trees, even though they are equivalent. In the HIR, however, parentheses nodes are removed, and those two expressions are represented in the same way.
- Once name resolution completes, we convert the AST into the HIR,
or "high-level intermediate representation". The HIR is defined in
- Type-checking and subsequent analyses
- An important step in processing the HIR is to perform type
checking. This process assigns types to every HIR expression,
for example, and also is responsible for resolving some
"type-dependent" paths, such as field accesses (
x.f
– we can't know what fieldf
is being accessed until we know the type ofx
) and associated type references (T::Item
– we can't know what typeItem
is until we know whatT
is). - Type checking creates "side-tables" (
TypeckTables
) that include the types of expressions, the way to resolve methods, and so forth. - After type-checking, we can do other analyses, such as privacy checking.
- An important step in processing the HIR is to perform type
checking. This process assigns types to every HIR expression,
for example, and also is responsible for resolving some
"type-dependent" paths, such as field accesses (
- Lowering to MIR and post-processing
- Once type-checking is done, we can lower the HIR into MIR ("middle IR"), which is a very desugared version of Rust, well suited to borrowck but also to certain high-level optimizations.
- Translation to LLVM and LLVM optimizations
- From MIR, we can produce LLVM IR.
- LLVM then runs its various optimizations, which produces a number of
.o
files (one for each "codegen unit").
- Linking
- Finally, those
.o
files are linked together.
- Finally, those
The Rustc Driver and Interface
The rustc_driver
is essentially rustc
's main()
function. It acts as
the glue for running the various phases of the compiler in the correct order,
using the interface defined in the rustc_interface
crate.
The rustc_interface
crate provides external users with an (unstable) API
for running code at particular times during the compilation process, allowing
third parties to effectively use rustc
's internals as a library for
analysing a crate or emulating the compiler in-process (e.g. the RLS or rustdoc).
For those using rustc
as a library, the interface::run_compiler()
function is the main
entrypoint to the compiler. It takes a configuration for the compiler and a closure that
takes a Compiler
. run_compiler
creates a Compiler
from the configuration and passes
it to the closure. Inside the closure, you can use the Compiler
to drive queries to compile
a crate and get the results. This is what the rustc_driver
does too.
You can see what queries are currently available through the rustdocs for Compiler
.
You can see an example of how to use them by looking at the rustc_driver
implementation,
specifically the rustc_driver::run_compiler
function (not to be confused with
interface::run_compiler
). The rustc_driver::run_compiler
function takes a bunch of
command-line args and some other configurations and drives the compilation to completion.
rustc_driver::run_compiler
also takes a Callbacks
. In the past, when
the rustc_driver::run_compiler
was the primary way to use the compiler as a
library, these callbacks were used to have some custom code run after different
phases of the compilation. If you read Appendix A, you may notice the use of the
types CompilerCalls
and CompileController
, which no longer exist. Callbacks
replaces this functionality.
Warning: By its very nature, the internal compiler APIs are always going to be unstable. That said, we do try not to break things unnecessarily.
A Note On Lifetimes
The Rust compiler is a fairly large program containing lots of big data
structures (e.g. the AST, HIR, and the type system) and as such, arenas and
references are heavily relied upon to minimize unnecessary memory use. This
manifests itself in the way people can plug into the compiler, preferring a
"push"-style API (callbacks) instead of the more Rust-ic "pull" style (think
the Iterator
trait).
Thread-local storage and interning are used a lot through the compiler to reduce
duplication while also preventing a lot of the ergonomic issues due to many
pervasive lifetimes. The rustc::ty::tls
module is used to access these
thread-locals, although you should rarely need to touch it.
The walking tour of rustdoc
Rustdoc actually uses the rustc internals directly. It lives in-tree with the compiler and standard library. This chapter is about how it works.
Rustdoc is implemented entirely within the crate librustdoc
. It runs
the compiler up to the point where we have an internal representation of a
crate (HIR) and the ability to run some queries about the types of items. HIR
and queries are discussed in the linked chapters.
librustdoc
performs two major steps after that to render a set of
documentation:
- "Clean" the AST into a form that's more suited to creating documentation (and slightly more resistant to churn in the compiler).
- Use this cleaned AST to render a crate's documentation, one page at a time.
Naturally, there's more than just this, and those descriptions simplify out lots of details, but that's the high-level overview.
(Side note: librustdoc
is a library crate! The rustdoc
binary is created
using the project in src/tools/rustdoc
. Note that literally all that
does is call the main()
that's in this crate's lib.rs
, though.)
Cheat sheet
- Use
./x.py build --stage 1 src/libstd src/tools/rustdoc
to make a usable rustdoc you can run on other projects.- Add
src/libtest
to be able to userustdoc --test
. - If you've used
rustup toolchain link local /path/to/build/$TARGET/stage1
previously, then after the previous build command,cargo +local doc
will Just Work.
- Add
- Use
./x.py doc --stage 1 src/libstd
to use this rustdoc to generate the standard library docs.- The completed docs will be available in
build/$TARGET/doc/std
, though the bundle is meant to be used as though you would copy out thedoc
folder to a web server, since that's where the CSS/JS and landing page are.
- The completed docs will be available in
- Most of the HTML printing code is in
html/format.rs
andhtml/render.rs
. It's in a bunch offmt::Display
implementations and supplementary functions. - The types that got
Display
impls above are defined inclean/mod.rs
, right next to the customClean
trait used to process them out of the rustc HIR. - The bits specific to using rustdoc as a test harness are in
test.rs
. - The Markdown renderer is loaded up in
html/markdown.rs
, including functions for extracting doctests from a given block of Markdown. - The tests on rustdoc output are located in
src/test/rustdoc
, where they're handled by the test runner of rustbuild and the supplementary scriptsrc/etc/htmldocck.py
. - Tests on search index generation are located in
src/test/rustdoc-js
, as a series of JavaScript files that encode queries on the standard library search index and expected results.
From crate to clean
In core.rs
are two central items: the DocContext
struct, and the run_core
function. The latter is where rustdoc calls out to rustc to compile a crate to
the point where rustdoc can take over. The former is a state container used
when crawling through a crate to gather its documentation.
The main process of crate crawling is done in clean/mod.rs
through several
implementations of the Clean
trait defined within. This is a conversion
trait, which defines one method:
pub trait Clean<T> {
fn clean(&self, cx: &DocContext) -> T;
}
clean/mod.rs
also defines the types for the "cleaned" AST used later on to
render documentation pages. Each usually accompanies an implementation of
Clean
that takes some AST or HIR type from rustc and converts it into the
appropriate "cleaned" type. "Big" items like modules or associated items may
have some extra processing in its Clean
implementation, but for the most part
these impls are straightforward conversions. The "entry point" to this module
is the impl Clean<Crate> for visit_ast::RustdocVisitor
, which is called by
run_core
above.
You see, I actually lied a little earlier: There's another AST transformation
that happens before the events in clean/mod.rs
. In visit_ast.rs
is the
type RustdocVisitor
, which actually crawls a hir::Crate
to get the first
intermediate representation, defined in doctree.rs
. This pass is mainly to
get a few intermediate wrappers around the HIR types and to process visibility
and inlining. This is where #[doc(inline)]
, #[doc(no_inline)]
, and
#[doc(hidden)]
are processed, as well as the logic for whether a pub use
should get the full page or a "Reexport" line in the module page.
The other major thing that happens in clean/mod.rs
is the collection of doc
comments and #[doc=""]
attributes into a separate field of the Attributes
struct, present on anything that gets hand-written documentation. This makes it
easier to collect this documentation later in the process.
The primary output of this process is a clean::Crate
with a tree of Items
which describe the publicly-documentable items in the target crate.
Hot potato
Before moving on to the next major step, a few important "passes" occur over
the documentation. These do things like combine the separate "attributes" into
a single string and strip leading whitespace to make the document easier on the
markdown parser, or drop items that are not public or deliberately hidden with
#[doc(hidden)]
. These are all implemented in the passes/
directory, one
file per pass. By default, all of these passes are run on a crate, but the ones
regarding dropping private/hidden items can be bypassed by passing
--document-private-items
to rustdoc. Note that unlike the previous set of AST
transformations, the passes happen on the cleaned crate.
(Strictly speaking, you can fine-tune the passes run and even add your own, but we're trying to deprecate that. If you need finer-grain control over these passes, please let us know!)
Here is current (as of this writing) list of passes:
propagate-doc-cfg
- propagates#[doc(cfg(...))]
to child items.collapse-docs
concatenates all document attributes into one document attribute. This is necessary because each line of a doc comment is given as a separate doc attribute, and this will combine them into a single string with line breaks between each attribute.unindent-comments
removes excess indentation on comments in order for markdown to like it. This is necessary because the convention for writing documentation is to provide a space between the///
or//!
marker and the text, and stripping that leading space will make the text easier to parse by the Markdown parser. (In the past, the markdown parser used was not Commonmark- compliant, which caused annoyances with extra whitespace but this seems to be less of an issue today.)strip-priv-imports
strips all private import statements (use
,extern crate
) from a crate. This is necessary because rustdoc will handle public imports by either inlining the item's documentation to the module or creating a "Reexports" section with the import in it. The pass ensures that all of these imports are actually relevant to documentation.strip-hidden
andstrip-private
strip alldoc(hidden)
and private items from the output.strip-private
impliesstrip-priv-imports
. Basically, the goal is to remove items that are not relevant for public documentation.
From clean to crate
This is where the "second phase" in rustdoc begins. This phase primarily lives
in the html/
folder, and it all starts with run()
in html/render.rs
. This
code is responsible for setting up the Context
, SharedContext
, and Cache
which are used during rendering, copying out the static files which live in
every rendered set of documentation (things like the fonts, CSS, and JavaScript
that live in html/static/
), creating the search index, and printing out the
source code rendering, before beginning the process of rendering all the
documentation for the crate.
Several functions implemented directly on Context
take the clean::Crate
and
set up some state between rendering items or recursing on a module's child
items. From here the "page rendering" begins, via an enormous write!()
call
in html/layout.rs
. The parts that actually generate HTML from the items and
documentation occurs within a series of std::fmt::Display
implementations and
functions that pass around a &mut std::fmt::Formatter
. The top-level
implementation that writes out the page body is the impl<'a> fmt::Display for Item<'a>
in html/render.rs
, which switches out to one of several item_*
functions based on the kind of Item
being rendered.
Depending on what kind of rendering code you're looking for, you'll probably
find it either in html/render.rs
for major items like "what sections should I
print for a struct page" or html/format.rs
for smaller component pieces like
"how should I print a where clause as part of some other item".
Whenever rustdoc comes across an item that should print hand-written
documentation alongside, it calls out to html/markdown.rs
which interfaces
with the Markdown parser. This is exposed as a series of types that wrap a
string of Markdown, and implement fmt::Display
to emit HTML text. It takes
special care to enable certain features like footnotes and tables and add
syntax highlighting to Rust code blocks (via html/highlight.rs
) before
running the Markdown parser. There's also a function in here
(find_testable_code
) that specifically scans for Rust code blocks so the
test-runner code can find all the doctests in the crate.
From soup to nuts
(alternate title: "An unbroken thread that stretches from those first Cell
s
to us")
It's important to note that the AST cleaning can ask the compiler for
information (crucially, DocContext
contains a TyCtxt
), but page rendering
cannot. The clean::Crate
created within run_core
is passed outside the
compiler context before being handed to html::render::run
. This means that a
lot of the "supplementary data" that isn't immediately available inside an
item's definition, like which trait is the Deref
trait used by the language,
needs to be collected during cleaning, stored in the DocContext
, and passed
along to the SharedContext
during HTML rendering. This manifests as a bunch
of shared state, context variables, and RefCell
s.
Also of note is that some items that come from "asking the compiler" don't go
directly into the DocContext
- for example, when loading items from a foreign
crate, rustdoc will ask about trait implementations and generate new Item
s
for the impls based on that information. This goes directly into the returned
Crate
rather than roundabout through the DocContext
. This way, these
implementations can be collected alongside the others, right before rendering
the HTML.
Other tricks up its sleeve
All this describes the process for generating HTML documentation from a Rust
crate, but there are couple other major modes that rustdoc runs in. It can also
be run on a standalone Markdown file, or it can run doctests on Rust code or
standalone Markdown files. For the former, it shortcuts straight to
html/markdown.rs
, optionally including a mode which inserts a Table of
Contents to the output HTML.
For the latter, rustdoc runs a similar partial-compilation to get relevant
documentation in test.rs
, but instead of going through the full clean and
render process, it runs a much simpler crate walk to grab just the
hand-written documentation. Combined with the aforementioned
"find_testable_code
" in html/markdown.rs
, it builds up a collection of
tests to run before handing them off to the libtest test runner. One notable
location in test.rs
is the function make_test
, which is where hand-written
doctests get transformed into something that can be executed.
Some extra reading about make_test
can be found
here.
Dotting i's and crossing t's
So that's rustdoc's code in a nutshell, but there's more things in the repo
that deal with it. Since we have the full compiletest
suite at hand, there's
a set of tests in src/test/rustdoc
that make sure the final HTML is what we
expect in various situations. These tests also use a supplementary script,
src/etc/htmldocck.py
, that allows it to look through the final HTML using
XPath notation to get a precise look at the output. The full description of all
the commands available to rustdoc tests is in htmldocck.py
.
In addition, there are separate tests for the search index and rustdoc's
ability to query it. The files in src/test/rustdoc-js
each contain a
different search query and the expected results, broken out by search tab.
These files are processed by a script in src/tools/rustdoc-js
and the Node.js
runtime. These tests don't have as thorough of a writeup, but a broad example
that features results in all tabs can be found in basic.js
. The basic idea is
that you match a given QUERY
with a set of EXPECTED
results, complete with
the full item path of each item.
Queries: demand-driven compilation
As described in the high-level overview of the compiler, the
Rust compiler is current transitioning from a traditional "pass-based"
setup to a "demand-driven" system. The Compiler Query System is the
key to our new demand-driven organization. The idea is pretty
simple. You have various queries that compute things about the input
– for example, there is a query called type_of(def_id)
that, given
the def-id of some item, will compute the type of that item and return
it to you.
Query execution is memoized – so the first time you invoke a query, it will go do the computation, but the next time, the result is returned from a hashtable. Moreover, query execution fits nicely into incremental computation; the idea is roughly that, when you do a query, the result may be returned to you by loading stored data from disk (but that's a separate topic we won't discuss further here).
The overall vision is that, eventually, the entire compiler control-flow will be query driven. There will effectively be one top-level query ("compile") that will run compilation on a crate; this will in turn demand information about that crate, starting from the end. For example:
- This "compile" query might demand to get a list of codegen-units (i.e. modules that need to be compiled by LLVM).
- But computing the list of codegen-units would invoke some subquery that returns the list of all modules defined in the Rust source.
- That query in turn would invoke something asking for the HIR.
- This keeps going further and further back until we wind up doing the actual parsing.
However, that vision is not fully realized. Still, big chunks of the compiler (for example, generating MIR) work exactly like this.
The Query Evaluation Model in Detail
The Query Evaluation Model in Detail chapter gives a more in-depth description of what queries are and how they work. If you intend to write a query of your own, this is a good read.
Invoking queries
To invoke a query is simple. The tcx ("type context") offers a method
for each defined query. So, for example, to invoke the type_of
query, you would just do this:
let ty = tcx.type_of(some_def_id);
How the compiler executes a query
So you may be wondering what happens when you invoke a query
method. The answer is that, for each query, the compiler maintains a
cache – if your query has already been executed, then, the answer is
simple: we clone the return value out of the cache and return it
(therefore, you should try to ensure that the return types of queries
are cheaply cloneable; insert a Rc
if necessary).
Providers
If, however, the query is not in the cache, then the compiler will try to find a suitable provider. A provider is a function that has been defined and linked into the compiler somewhere that contains the code to compute the result of the query.
Providers are defined per-crate. The compiler maintains,
internally, a table of providers for every crate, at least
conceptually. Right now, there are really two sets: the providers for
queries about the local crate (that is, the one being compiled)
and providers for queries about external crates (that is,
dependencies of the local crate). Note that what determines the crate
that a query is targeting is not the kind of query, but the key.
For example, when you invoke tcx.type_of(def_id)
, that could be a
local query or an external query, depending on what crate the def_id
is referring to (see the self::keys::Key
trait for more information
on how that works).
Providers always have the same signature:
fn provider<'cx, 'tcx>(tcx: TyCtxt<'cx, 'tcx, 'tcx>,
key: QUERY_KEY)
-> QUERY_RESULT
{
...
}
Providers take two arguments: the tcx
and the query key. Note also
that they take the global tcx (i.e. they use the 'tcx
lifetime
twice), rather than taking a tcx with some active inference context.
They return the result of the query.
How providers are setup
When the tcx is created, it is given the providers by its creator using
the Providers
struct. This struct is generated by the macros here, but it
is basically a big list of function pointers:
struct Providers {
type_of: for<'cx, 'tcx> fn(TyCtxt<'cx, 'tcx, 'tcx>, DefId) -> Ty<'tcx>,
...
}
At present, we have one copy of the struct for local crates, and one for external crates, though the plan is that we may eventually have one per crate.
These Provider
structs are ultimately created and populated by
librustc_driver
, but it does this by distributing the work
throughout the other rustc_*
crates. This is done by invoking
various provide
functions. These functions tend to look something
like this:
pub fn provide(providers: &mut Providers) {
*providers = Providers {
type_of,
..*providers
};
}
That is, they take an &mut Providers
and mutate it in place. Usually
we use the formulation above just because it looks nice, but you could
as well do providers.type_of = type_of
, which would be equivalent.
(Here, type_of
would be a top-level function, defined as we saw
before.) So, if we want to add a provider for some other query,
let's call it fubar
, into the crate above, we might modify the provide()
function like so:
pub fn provide(providers: &mut Providers) {
*providers = Providers {
type_of,
fubar,
..*providers
};
}
fn fubar<'cx, 'tcx>(tcx: TyCtxt<'cx, 'tcx>, key: DefId) -> Fubar<'tcx> { ... }
N.B. Most of the rustc_*
crates only provide local
providers. Almost all extern providers wind up going through the
rustc_metadata
crate, which loads the information from the
crate metadata. But in some cases there are crates that provide queries for
both local and external crates, in which case they define both a
provide
and a provide_extern
function that rustc_driver
can
invoke.
Adding a new kind of query
So suppose you want to add a new kind of query, how do you do so? Well, defining a query takes place in two steps:
- first, you have to specify the query name and arguments; and then,
- you have to supply query providers where needed.
To specify the query name and arguments, you simply add an entry to
the big macro invocation in
src/librustc/query/mod.rs
, which looks something like:
rustc_queries! {
Other {
/// Records the type of every item.
query type_of(key: DefId) -> Ty<'tcx> {
cache { key.is_local() }
}
}
...
}
Queries are grouped into categories (Other
, Codegen
, TypeChecking
, etc.).
Each group contains one or more queries. Each query definition is broken up like
this:
query type_of(key: DefId) -> Ty<'tcx> { ... }
^^ ^^^^^^^ ^^^^^ ^^^^^^^^ ^^^
| | | | |
| | | | query modifiers
| | | result type of query
| | query key type
| name of query
query keyword
Let's go over them one by one:
- Query keyword: indicates a start of a query definition.
- Name of query: the name of the query method
(
tcx.type_of(..)
). Also used as the name of a struct (ty::queries::type_of
) that will be generated to represent this query. - Query key type: the type of the argument to this query.
This type must implement the
ty::query::keys::Key
trait, which defines (for example) how to map it to a crate, and so forth. - Result type of query: the type produced by this query. This type
should (a) not use
RefCell
or other interior mutability and (b) be cheaply cloneable. Interning or usingRc
orArc
is recommended for non-trivial data types.- The one exception to those rules is the
ty::steal::Steal
type, which is used to cheaply modify MIR in place. See the definition ofSteal
for more details. New uses ofSteal
should not be added without alerting@rust-lang/compiler
.
- The one exception to those rules is the
- Query modifiers: various flags and options that customize how the query is processed.
So, to add a query:
- Add an entry to
rustc_queries!
using the format above. - Link the provider by modifying the appropriate
provide
method; or add a new one if needed and ensure thatrustc_driver
is invoking it.
Query structs and descriptions
For each kind, the rustc_queries
macro will generate a "query struct"
named after the query. This struct is a kind of a place-holder
describing the query. Each such struct implements the
self::config::QueryConfig
trait, which has associated types for the
key/value of that particular query. Basically the code generated looks something
like this:
// Dummy struct representing a particular kind of query:
pub struct type_of<'tcx> { data: PhantomData<&'tcx ()> }
impl<'tcx> QueryConfig for type_of<'tcx> {
type Key = DefId;
type Value = Ty<'tcx>;
const NAME: QueryName = QueryName::type_of;
const CATEGORY: ProfileCategory = ProfileCategory::Other;
}
There is an additional trait that you may wish to implement called
self::config::QueryDescription
. This trait is used during cycle
errors to give a "human readable" name for the query, so that we can
summarize what was happening when the cycle occurred. Implementing
this trait is optional if the query key is DefId
, but if you don't
implement it, you get a pretty generic error ("processing foo
...").
You can put new impls into the config
module. They look something like this:
impl<'tcx> QueryDescription for queries::type_of<'tcx> {
fn describe(tcx: TyCtxt, key: DefId) -> String {
format!("computing the type of `{}`", tcx.def_path_str(key))
}
}
Another option is to add desc
modifier:
rustc_queries! {
Other {
/// Records the type of every item.
query type_of(key: DefId) -> Ty<'tcx> {
desc { |tcx| "computing the type of `{}`", tcx.def_path_str(key) }
}
}
}
rustc_queries
macro will generate an appropriate impl
automatically.
The Query Evaluation Model in Detail
This chapter provides a deeper dive into the abstract model queries are built on. It does not go into implementation details but tries to explain the underlying logic. The examples here, therefore, have been stripped down and simplified and don't directly reflect the compilers internal APIs.
What is a query?
Abstractly we view the compiler's knowledge about a given crate as a "database" and queries are the way of asking the compiler questions about it, i.e. we "query" the compiler's "database" for facts.
However, there's something special to this compiler database: It starts out empty and is filled on-demand when queries are executed. Consequently, a query must know how to compute its result if the database does not contain it yet. For doing so, it can access other queries and certain input values that the database is pre-filled with on creation.
A query thus consists of the following things:
- A name that identifies the query
- A "key" that specifies what we want to look up
- A result type that specifies what kind of result it yields
- A "provider" which is a function that specifies how the result is to be computed if it isn't already present in the database.
As an example, the name of the type_of
query is type_of
, its query key is a
DefId
identifying the item we want to know the type of, the result type is
Ty<'tcx>
, and the provider is a function that, given the query key and access
to the rest of the database, can compute the type of the item identified by the
key.
So in some sense a query is just a function that maps the query key to the corresponding result. However, we have to apply some restrictions in order for this to be sound:
- The key and result must be immutable values.
- The provider function must be a pure function, that is, for the same key it must always yield the same result.
- The only parameters a provider function takes are the key and a reference to the "query context" (which provides access to rest of the "database").
The database is built up lazily by invoking queries. The query providers will invoke other queries, for which the result is either already cached or computed by calling another query provider. These query provider invocations conceptually form a directed acyclic graph (DAG) at the leaves of which are input values that are already known when the query context is created.
Caching/Memoization
Results of query invocations are "memoized" which means that the query context will cache the result in an internal table and, when the query is invoked with the same query key again, will return the result from the cache instead of running the provider again.
This caching is crucial for making the query engine efficient. Without memoization the system would still be sound (that is, it would yield the same results) but the same computations would be done over and over again.
Memoization is one of the main reasons why query providers have to be pure functions. If calling a provider function could yield different results for each invocation (because it accesses some global mutable state) then we could not memoize the result.
Input data
When the query context is created, it is still empty: No queries have been executed, no results are cached. But the context already provides access to "input" data, i.e. pieces of immutable data that where computed before the context was created and that queries can access to do their computations. Currently this input data consists mainly of the HIR map and the command-line options the compiler was invoked with. In the future, inputs will just consist of command-line options and a list of source files -- the HIR map will itself be provided by a query which processes these source files.
Without inputs, queries would live in a void without anything to compute their result from (remember, query providers only have access to other queries and the context but not any other outside state or information).
For a query provider, input data and results of other queries look exactly the same: It just tells the context "give me the value of X". Because input data is immutable, the provider can rely on it being the same across different query invocations, just as is the case for query results.
An example execution trace of some queries
How does this DAG of query invocations come into existence? At some point the compiler driver will create the, as yet empty, query context. It will then, from outside of the query system, invoke the queries it needs to perform its task. This looks something like the following:
fn compile_crate() {
let cli_options = ...;
let hir_map = ...;
// Create the query context `tcx`
let tcx = TyCtxt::new(cli_options, hir_map);
// Do type checking by invoking the type check query
tcx.type_check_crate();
}
The type_check_crate
query provider would look something like the following:
fn type_check_crate_provider(tcx, _key: ()) {
let list_of_items = tcx.hir_map.list_of_items();
for item_def_id in list_of_hir_items {
tcx.type_check_item(item_def_id);
}
}
We see that the type_check_crate
query accesses input data
(tcx.hir_map.list_of_items()
) and invokes other queries
(type_check_item
). The type_check_item
invocations will themselves access input data and/or invoke other queries,
so that in the end the DAG of query invocations will be built up backwards
from the node that was initially executed:
(2) (1)
list_of_all_hir_items <----------------------------- type_check_crate()
|
(5) (4) (3) |
Hir(foo) <--- type_of(foo) <--- type_check_item(foo) <-------+
| |
+-----------------+ |
| |
(7) v (6) (8) |
Hir(bar) <--- type_of(bar) <--- type_check_item(bar) <-------+
// (x) denotes invocation order
We also see that often a query result can be read from the cache:
type_of(bar)
was computed for type_check_item(foo)
so when
type_check_item(bar)
needs it, it is already in the cache.
Query results stay cached in the query context as long as the context lives. So if the compiler driver invoked another query later on, the above graph would still exist and already executed queries would not have to be re-done.
Cycles
Earlier we stated that query invocations form a DAG. However, it would be easy form a cyclic graph by, for example, having a query provider like the following:
fn cyclic_query_provider(tcx, key) -> u32 {
// Invoke the same query with the same key again
tcx.cyclic_query(key)
}
Since query providers are regular functions, this would behave much as expected: Evaluation would get stuck in an infinite recursion. A query like this would not be very useful either. However, sometimes certain kinds of invalid user input can result in queries being called in a cyclic way. The query engine includes a check for cyclic invocations and, because cycles are an irrecoverable error, will abort execution with a "cycle error" messages that tries to be human readable.
At some point the compiler had a notion of "cycle recovery", that is, one could "try" to execute a query and if it ended up causing a cycle, proceed in some other fashion. However, this was later removed because it is not entirely clear what the theoretical consequences of this are, especially regarding incremental compilation.
"Steal" Queries
Some queries have their result wrapped in a Steal<T>
struct. These queries
behave exactly the same as regular with one exception: Their result is expected
to be "stolen" out of the cache at some point, meaning some other part of the
program is taking ownership of it and the result cannot be accessed anymore.
This stealing mechanism exists purely as a performance optimization because some result values are too costly to clone (e.g. the MIR of a function). It seems like result stealing would violate the condition that query results must be immutable (after all we are moving the result value out of the cache) but it is OK as long as the mutation is not observable. This is achieved by two things:
- Before a result is stolen, we make sure to eagerly run all queries that might ever need to read that result. This has to be done manually by calling those queries.
- Whenever a query tries to access a stolen result, we make the compiler ICE so that such a condition cannot go unnoticed.
This is not an ideal setup because of the manual intervention needed, so it should be used sparingly and only when it is well known which queries might access a given result. In practice, however, stealing has not turned out to be much of a maintenance burden.
To summarize: "Steal queries" break some of the rules in a controlled way. There are checks in place that make sure that nothing can go silently wrong.
Parallel Query Execution
The query model has some properties that make it actually feasible to evaluate multiple queries in parallel without too much of an effort:
- All data a query provider can access is accessed via the query context, so the query context can take care of synchronizing access.
- Query results are required to be immutable so they can safely be used by different threads concurrently.
The nightly compiler already implements parallel query evaluation as follows:
When a query foo
is evaluated, the cache table for foo
is locked.
- If there already is a result, we can clone it,release the lock and we are done.
- If there is no cache entry and no other active query invocation computing the same result, we mark the key as being "in progress", release the lock and start evaluating.
- If there is another query invocation for the same key in progress, we release the lock, and just block the thread until the other invocation has computed the result we are waiting for. This cannot deadlock because, as mentioned before, query invocations form a DAG. Some thread will always make progress.
Incremental compilation
The incremental compilation scheme is, in essence, a surprisingly simple extension to the overall query system. We'll start by describing a slightly simplified variant of the real thing – the "basic algorithm" – and then describe some possible improvements.
The basic algorithm
The basic algorithm is called the red-green algorithm1. The high-level idea is that, after each run of the compiler, we will save the results of all the queries that we do, as well as the query DAG. The query DAG is a DAG that indexes which queries executed which other queries. So, for example, there would be an edge from a query Q1 to another query Q2 if computing Q1 required computing Q2 (note that because queries cannot depend on themselves, this results in a DAG and not a general graph).
On the next run of the compiler, then, we can sometimes reuse these query results to avoid re-executing a query. We do this by assigning every query a color:
- If a query is colored red, that means that its result during this compilation has changed from the previous compilation.
- If a query is colored green, that means that its result is the same as the previous compilation.
There are two key insights here:
- First, if all the inputs to query Q are colored green, then the query Q must result in the same value as last time and hence need not be re-executed (or else the compiler is not deterministic).
- Second, even if some inputs to a query changes, it may be that it
still produces the same result as the previous compilation. In
particular, the query may only use part of its input.
- Therefore, after executing a query, we always check whether it produced the same result as the previous time. If it did, we can still mark the query as green, and hence avoid re-executing dependent queries.
The try-mark-green algorithm
At the core of incremental compilation is an algorithm called "try-mark-green". It has the job of determining the color of a given query Q (which must not have yet been executed). In cases where Q has red inputs, determining Q's color may involve re-executing Q so that we can compare its output, but if all of Q's inputs are green, then we can conclude that Q must be green without re-executing it or inspecting its value at all. In the compiler, this allows us to avoid deserializing the result from disk when we don't need it, and in fact enables us to sometimes skip serializing the result as well (see the refinements section below).
Try-mark-green works as follows:
- First check if the query Q was executed during the previous compilation.
- If not, we can just re-execute the query as normal, and assign it the color of red.
- If yes, then load the 'dependent queries' of Q.
- If there is a saved result, then we load the
reads(Q)
vector from the query DAG. The "reads" is the set of queries that Q executed during its execution.- For each query R in
reads(Q)
, we recursively demand the color of R using try-mark-green.- Note: it is important that we visit each node in
reads(Q)
in same order as they occurred in the original compilation. See the section on the query DAG below. - If any of the nodes in
reads(Q)
wind up colored red, then Q is dirty.- We re-execute Q and compare the hash of its result to the hash of the result from the previous compilation.
- If the hash has not changed, we can mark Q as green and return.
- Otherwise, all of the nodes in
reads(Q)
must be green. In that case, we can color Q as green and return.
- Note: it is important that we visit each node in
- For each query R in
The query DAG
The query DAG code is stored in
src/librustc/dep_graph
. Construction of the DAG is done
by instrumenting the query execution.
One key point is that the query DAG also tracks ordering; that is, for each query Q, we not only track the queries that Q reads, we track the order in which they were read. This allows try-mark-green to walk those queries back in the same order. This is important because once a subquery comes back as red, we can no longer be sure that Q will continue along the same path as before. That is, imagine a query like this:
fn main_query(tcx) {
if tcx.subquery1() {
tcx.subquery2()
} else {
tcx.subquery3()
}
}
Now imagine that in the first compilation, main_query
starts by
executing subquery1
, and this returns true. In that case, the next
query main_query
executes will be subquery2
, and subquery3
will
not be executed at all.
But now imagine that in the next compilation, the input has
changed such that subquery1
returns false. In this case, subquery2
would never execute. If try-mark-green were to visit reads(main_query)
out
of order, however, it might visit subquery2
before subquery1
, and hence
execute it.
This can lead to ICEs and other problems in the compiler.
Improvements to the basic algorithm
In the description of the basic algorithm, we said that at the end of compilation we would save the results of all the queries that were performed. In practice, this can be quite wasteful – many of those results are very cheap to recompute, and serializing and deserializing them is not a particular win. In practice, what we would do is to save the hashes of all the subqueries that we performed. Then, in select cases, we also save the results.
This is why the incremental algorithm separates computing the color of a node, which often does not require its value, from computing the result of a node. Computing the result is done via a simple algorithm like so:
- Check if a saved result for Q is available. If so, compute the color of Q. If Q is green, deserialize and return the saved result.
- Otherwise, execute Q.
- We can then compare the hash of the result and color Q as green if it did not change.
Resources
The initial design document can be found at https://github.com/nikomatsakis/rustc-on-demand-incremental-design-doc/blob/master/0000-rustc-on-demand-and-incremental.md, which expands on the memoization details, provides more high-level overview and motivation for this system.
Footnotes
I have long wanted to rename it to the Salsa algorithm, but it never caught on. -@nikomatsakis
Incremental Compilation In Detail
The incremental compilation scheme is, in essence, a surprisingly simple extension to the overall query system. It relies on the fact that:
- queries are pure functions -- given the same inputs, a query will always yield the same result, and
- the query model structures compilation in an acyclic graph that makes dependencies between individual computations explicit.
This chapter will explain how we can use these properties for making things incremental and then goes on to discuss version implementation issues.
A Basic Algorithm For Incremental Query Evaluation
As explained in the query evaluation model primer, query invocations form a directed-acyclic graph. Here's the example from the previous chapter again:
list_of_all_hir_items <----------------------------- type_check_crate()
|
|
Hir(foo) <--- type_of(foo) <--- type_check_item(foo) <-------+
| |
+-----------------+ |
| |
v |
Hir(bar) <--- type_of(bar) <--- type_check_item(bar) <-------+
Since every access from one query to another has to go through the query context, we can record these accesses and thus actually build this dependency graph in memory. With dependency tracking enabled, when compilation is done, we know which queries were invoked (the nodes of the graph) and for each invocation, which other queries or input has gone into computing the query's result (the edges of the graph).
Now suppose, we change the source code of our program so that
HIR of bar
looks different than before. Our goal is to only recompute
those queries that are actually affected by the change while just re-using
the cached results of all the other queries. Given the dependency graph we can
do exactly that. For a given query invocation, the graph tells us exactly
what data has gone into computing its results, we just have to follow the
edges until we reach something that has changed. If we don't encounter
anything that has changed, we know that the query still would evaluate to
the same result we already have in our cache.
Taking the type_of(foo)
invocation from above as example, we can check
whether the cached result is still valid by following the edges to its
inputs. The only edge leads to Hir(foo)
, an input that has not been affected
by the change. So we know that the cached result for type_of(foo)
is still
valid.
The story is a bit different for type_check_item(foo)
: We again walk the
edges and already know that type_of(foo)
is fine. Then we get to
type_of(bar)
which we have not checked yet, so we walk the edges of
type_of(bar)
and encounter Hir(bar)
which has changed. Consequently
the result of type_of(bar)
might yield a different same result than what we
have in the cache and, transitively, the result of type_check_item(foo)
might have changed too. We thus re-run type_check_item(foo)
, which in
turn will re-run type_of(bar)
, which will yield an up-to-date result
because it reads the up-to-date version of Hir(bar)
.
The Problem With The Basic Algorithm: False Positives
If you read the previous paragraph carefully, you'll notice that it says that
type_of(bar)
might have changed because one of its inputs has changed.
There's also the possibility that it might still yield exactly the same
result even though its input has changed. Consider an example with a
simple query that just computes the sign of an integer:
IntValue(x) <---- sign_of(x) <--- some_other_query(x)
Let's say that IntValue(x)
starts out as 1000
and then is set to 2000
.
Even though IntValue(x)
is different in the two cases, sign_of(x)
yields
the result +
in both cases.
If we follow the basic algorithm, however, some_other_query(x)
would have to
(unnecessarily) be re-evaluated because it transitively depends on a changed
input. Change detection yields a "false positive" in this case because it has
to conservatively assume that some_other_query(x)
might be affected by that
changed input.
Unfortunately it turns out that the actual queries in the compiler are full of examples like this and small changes to the input often potentially affect very large parts of the output binaries. As a consequence, we had to make the change detection system smarter and more accurate.
Improving Accuracy: The red-green Algorithm
The "false positives" problem can be solved by interleaving change detection and query re-evaluation. Instead of walking the graph all the way to the inputs when trying to find out if some cached result is still valid, we can check if a result has actually changed after we were forced to re-evaluate it.
We call this algorithm, for better or worse, the red-green algorithm because nodes in the dependency graph are assigned the color green if we were able to prove that its cached result is still valid and the color red if the result has turned out to be different after re-evaluating it.
The meat of red-green change tracking is implemented in the try-mark-green algorithm, that, you've guessed it, tries to mark a given node as green:
fn try_mark_green(tcx, current_node) -> bool {
// Fetch the inputs to `current_node`, i.e. get the nodes that the direct
// edges from `node` lead to.
let dependencies = tcx.dep_graph.get_dependencies_of(current_node);
// Now check all the inputs for changes
for dependency in dependencies {
match tcx.dep_graph.get_node_color(dependency) {
Green => {
// This input has already been checked before and it has not
// changed; so we can go on to check the next one
}
Red => {
// We found an input that has changed. We cannot mark
// `current_node` as green without re-running the
// corresponding query.
return false
}
Unknown => {
// This is the first time we are look at this node. Let's try
// to mark it green by calling try_mark_green() recursively.
if try_mark_green(tcx, dependency) {
// We successfully marked the input as green, on to the
// next.
} else {
// We could *not* mark the input as green. This means we
// don't know if its value has changed. In order to find
// out, we re-run the corresponding query now!
tcx.run_query_for(dependency);
// Fetch and check the node color again. Running the query
// has forced it to either red (if it yielded a different
// result than we have in the cache) or green (if it
// yielded the same result).
match tcx.dep_graph.get_node_color(dependency) {
Red => {
// The input turned out to be red, so we cannot
// mark `current_node` as green.
return false
}
Green => {
// Re-running the query paid off! The result is the
// same as before, so this particular input does
// not invalidate `current_node`.
}
Unknown => {
// There is no way a node has no color after
// re-running the query.
panic!("unreachable")
}
}
}
}
}
}
// If we have gotten through the entire loop, it means that all inputs
// have turned out to be green. If all inputs are unchanged, it means
// that the query result corresponding to `current_node` cannot have
// changed either.
tcx.dep_graph.mark_green(current_node);
true
}
// Note: The actual implementation can be found in
// src/librustc/dep_graph/graph.rs
By using red-green marking we can avoid the devastating cumulative effect of
having false positives during change detection. Whenever a query is executed
in incremental mode, we first check if its already green. If not, we run
try_mark_green()
on it. If it still isn't green after that, then we actually
invoke the query provider to re-compute the result.
The Real World: How Persistence Makes Everything Complicated
The sections above described the underlying algorithm for incremental compilation but because the compiler process exits after being finished and takes the query context with its result cache with it into oblivion, we have persist data to disk, so the next compilation session can make use of it. This comes with a whole new set of implementation challenges:
- The query results cache is stored to disk, so they are not readily available for change comparison.
- A subsequent compilation session will start off with new version of the code
that has arbitrary changes applied to it. All kinds of IDs and indices that
are generated from a global, sequential counter (e.g.
NodeId
,DefId
, etc) might have shifted, making the persisted results on disk not immediately usable anymore because the same numeric IDs and indices might refer to completely new things in the new compilation session. - Persisting things to disk comes at a cost, so not every tiny piece of information should be actually cached in between compilation sessions. Fixed-sized, plain-old-data is preferred to complex things that need to run branching code during (de-)serialization.
The following sections describe how the compiler currently solves these issues.
A Question Of Stability: Bridging The Gap Between Compilation Sessions
As noted before, various IDs (like DefId
) are generated by the compiler in a
way that depends on the contents of the source code being compiled. ID assignment
is usually deterministic, that is, if the exact same code is compiled twice,
the same things will end up with the same IDs. However, if something
changes, e.g. a function is added in the middle of a file, there is no
guarantee that anything will have the same ID as it had before.
As a consequence we cannot represent the data in our on-disk cache the same
way it is represented in memory. For example, if we just stored a piece
of type information like TyKind::FnDef(DefId, &'tcx Substs<'tcx>)
(as we do
in memory) and then the contained DefId
points to a different function in
a new compilation session we'd be in trouble.
The solution to this problem is to find "stable" forms for IDs which remain
valid in between compilation sessions. For the most important case, DefId
s,
these are the so-called DefPath
s. Each DefId
has a
corresponding DefPath
but in place of a numeric ID, a DefPath
is based on
the path to the identified item, e.g. std::collections::HashMap
. The
advantage of an ID like this is that it is not affected by unrelated changes.
For example, one can add a new function to std::collections
but
std::collections::HashMap
would still be std::collections::HashMap
. A
DefPath
is "stable" across changes made to the source code while a DefId
isn't.
There is also the DefPathHash
which is just a 128-bit hash value of the
DefPath
. The two contain the same information and we mostly use the
DefPathHash
because it simpler to handle, being Copy
and self-contained.
This principle of stable identifiers is used to make the data in the on-disk
cache resilient to source code changes. Instead of storing a DefId
, we store
the DefPathHash
and when we deserialize something from the cache, we map the
DefPathHash
to the corresponding DefId
in the current compilation session
(which is just a simple hash table lookup).
The HirId
, used for identifying HIR components that don't have their own
DefId
, is another such stable ID. It is (conceptually) a pair of a DefPath
and a LocalId
, where the LocalId
identifies something (e.g. a hir::Expr
)
locally within its "owner" (e.g. a hir::Item
). If the owner is moved around,
the LocalId
s within it are still the same.
Checking Query Results For Changes: StableHash And Fingerprints
In order to do red-green-marking we often need to check if the result of a query has changed compared to the result it had during the previous compilation session. There are two performance problems with this though:
- We'd like to avoid having to load the previous result from disk just for doing the comparison. We already computed the new result and will use that. Also loading a result from disk will "pollute" the interners with data that is unlikely to ever be used.
- We don't want to store each and every result in the on-disk cache. For example, it would be wasted effort to persist things to disk that are already available in upstream crates.
The compiler avoids these problems by using so-called Fingerprint
s. Each time
a new query result is computed, the query engine will compute a 128 bit hash
value of the result. We call this hash value "the Fingerprint
of the query
result". The hashing is (and has to be) done "in a stable way". This means
that whenever something is hashed that might change in between compilation
sessions (e.g. a DefId
), we instead hash its stable equivalent
(e.g. the corresponding DefPath
). That's what the whole StableHash
infrastructure is for. This way Fingerprint
s computed in two
different compilation sessions are still comparable.
The next step is to store these fingerprints along with the dependency graph. This is cheap since fingerprints are just bytes to be copied. It's also cheap to load the entire set of fingerprints together with the dependency graph.
Now, when red-green-marking reaches the point where it needs to check if a result has changed, it can just compare the (already loaded) previous fingerprint to the fingerprint of the new result.
This approach works rather well but it's not without flaws:
-
There is a small possibility of hash collisions. That is, two different results could have the same fingerprint and the system would erroneously assume that the result hasn't changed, leading to a missed update.
We mitigate this risk by using a high-quality hash function and a 128 bit wide hash value. Due to these measures the practical risk of a hash collision is negligible.
-
Computing fingerprints is quite costly. It is the main reason why incremental compilation can be slower than non-incremental compilation. We are forced to use a good and thus expensive hash function, and we have to map things to their stable equivalents while doing the hashing.
In the future we might want to explore different approaches to this problem.
For now it's StableHash
and Fingerprint
.
A Tale Of Two DepGraphs: The Old And The New
The initial description of dependency tracking glosses over a few details that quickly become a head scratcher when actually trying to implement things. In particular it's easy to overlook that we are actually dealing with two dependency graphs: The one we built during the previous compilation session and the one that we are building for the current compilation session.
When a compilation session starts, the compiler loads the previous dependency
graph into memory as an immutable piece of data. Then, when a query is invoked,
it will first try to mark the corresponding node in the graph as green. This
means really that we are trying to mark the node in the previous dep-graph
as green that corresponds to the query key in the current session. How do we
do this mapping between current query key and previous DepNode
? The answer
is again Fingerprint
s: Nodes in the dependency graph are identified by a
fingerprint of the query key. Since fingerprints are stable across compilation
sessions, computing one in the current session allows us to find a node
in the dependency graph from the previous session. If we don't find a node with
the given fingerprint, it means that the query key refers to something that
did not yet exist in the previous session.
So, having found the dep-node in the previous dependency graph, we can look up its dependencies (also dep-nodes in the previous graph) and continue with the rest of the try-mark-green algorithm. The next interesting thing happens when we successfully marked the node as green. At that point we copy the node and the edges to its dependencies from the old graph into the new graph. We have to do this because the new dep-graph cannot not acquire the node and edges via the regular dependency tracking. The tracking system can only record edges while actually running a query -- but running the query, although we have the result already cached, is exactly what we want to avoid.
Once the compilation session has finished, all the unchanged parts have been copied over from the old into the new dependency graph, while the changed parts have been added to the new graph by the tracking system. At this point, the new graph is serialized out to disk, alongside the query result cache, and can act as the previous dep-graph in a subsequent compilation session.
Didn't You Forget Something?: Cache Promotion
TODO
The Future: Shortcomings Of The Current System and Possible Solutions
TODO
Debugging and Testing Dependencies
Testing the dependency graph
There are various ways to write tests against the dependency graph.
The simplest mechanisms are the #[rustc_if_this_changed]
and
#[rustc_then_this_would_need]
annotations. These are used in compile-fail
tests to test whether the expected set of paths exist in the dependency graph.
As an example, see src/test/compile-fail/dep-graph-caller-callee.rs
.
The idea is that you can annotate a test like:
#[rustc_if_this_changed]
fn foo() { }
#[rustc_then_this_would_need(TypeckTables)] //~ ERROR OK
fn bar() { foo(); }
#[rustc_then_this_would_need(TypeckTables)] //~ ERROR no path
fn baz() { }
This will check whether there is a path in the dependency graph from Hir(foo)
to TypeckTables(bar)
. An error is reported for each
#[rustc_then_this_would_need]
annotation that indicates whether a path
exists. //~ ERROR
annotations can then be used to test if a path is found (as
demonstrated above).
Debugging the dependency graph
Dumping the graph
The compiler is also capable of dumping the dependency graph for your
debugging pleasure. To do so, pass the -Z dump-dep-graph
flag. The
graph will be dumped to dep_graph.{txt,dot}
in the current
directory. You can override the filename with the RUST_DEP_GRAPH
environment variable.
Frequently, though, the full dep graph is quite overwhelming and not particularly helpful. Therefore, the compiler also allows you to filter the graph. You can filter in three ways:
- All edges originating in a particular set of nodes (usually a single node).
- All edges reaching a particular set of nodes.
- All edges that lie between given start and end nodes.
To filter, use the RUST_DEP_GRAPH_FILTER
environment variable, which should
look like one of the following:
source_filter // nodes originating from source_filter
-> target_filter // nodes that can reach target_filter
source_filter -> target_filter // nodes in between source_filter and target_filter
source_filter
and target_filter
are a &
-separated list of strings.
A node is considered to match a filter if all of those strings appear in its
label. So, for example:
RUST_DEP_GRAPH_FILTER='-> TypeckTables'
would select the predecessors of all TypeckTables
nodes. Usually though you
want the TypeckTables
node for some particular fn, so you might write:
RUST_DEP_GRAPH_FILTER='-> TypeckTables & bar'
This will select only the predecessors of TypeckTables
nodes for functions
with bar
in their name.
Perhaps you are finding that when you change foo
you need to re-type-check
bar
, but you don't think you should have to. In that case, you might do:
RUST_DEP_GRAPH_FILTER='Hir & foo -> TypeckTables & bar'
This will dump out all the nodes that lead from Hir(foo)
to
TypeckTables(bar)
, from which you can (hopefully) see the source
of the erroneous edge.
Tracking down incorrect edges
Sometimes, after you dump the dependency graph, you will find some
path that should not exist, but you will not be quite sure how it came
to be. When the compiler is built with debug assertions, it can
help you track that down. Simply set the RUST_FORBID_DEP_GRAPH_EDGE
environment variable to a filter. Every edge created in the dep-graph
will be tested against that filter – if it matches, a bug!
is
reported, so you can easily see the backtrace (RUST_BACKTRACE=1
).
The syntax for these filters is the same as described in the previous section. However, note that this filter is applied to every edge and doesn't handle longer paths in the graph, unlike the previous section.
Example:
You find that there is a path from the Hir
of foo
to the type
check of bar
and you don't think there should be. You dump the
dep-graph as described in the previous section and open dep-graph.txt
to see something like:
Hir(foo) -> Collect(bar)
Collect(bar) -> TypeckTables(bar)
That first edge looks suspicious to you. So you set
RUST_FORBID_DEP_GRAPH_EDGE
to Hir&foo -> Collect&bar
, re-run, and
then observe the backtrace. Voila, bug fixed!
The Parser
The parser is responsible for converting raw Rust source code into a structured
form which is easier for the compiler to work with, usually called an Abstract
Syntax Tree. An AST mirrors the structure of a Rust program in memory,
using a Span
to link a particular AST node back to its source text.
The bulk of the parser lives in the libsyntax crate.
Like most parsers, the parsing process is composed of two main steps,
- lexical analysis – turn a stream of characters into a stream of token trees
- parsing – turn the token trees into an AST
The syntax
crate contains several main players,
- a
SourceMap
for mapping AST nodes to their source code - the ast module contains types corresponding to each AST node
- a
StringReader
for lexing source code into tokens - the parser module and
Parser
struct are in charge of actually parsing tokens into AST nodes, - and a visit module for walking the AST and inspecting or mutating the AST nodes.
The main entrypoint to the parser is via the various parse_*
functions in the
parser module. They let you do things like turn a SourceFile
(e.g. the source in a single file) into a token stream, create a parser from
the token stream, and then execute the parser to get a Crate
(the root AST
node).
To minimise the amount of copying that is done, both the StringReader
and
Parser
have lifetimes which bind them to the parent ParseSess
. This contains
all the information needed while parsing, as well as the SourceMap
itself.
The #[test]
attribute
Today, rust programmers rely on a built in attribute called #[test]
. All
you have to do is mark a function as a test and include some asserts like so:
#[test]
fn my_test() {
assert!(2+2 == 4);
}
When this program is compiled using rustc --test
or cargo test
, it will
produce an executable that can run this, and any other test function. This
method of testing allows tests to live alongside code in an organic way. You
can even put tests inside private modules:
mod my_priv_mod {
fn my_priv_func() -> bool {}
#[test]
fn test_priv_func() {
assert!(my_priv_func());
}
}
Private items can thus be easily tested without worrying about how to expose
the them to any sort of external testing apparatus. This is key to the
ergonomics of testing in Rust. Semantically, however, it's rather odd.
How does any sort of main
function invoke these tests if they're not visible?
What exactly is rustc --test
doing?
#[test]
is implemented as a syntactic transformation inside the compiler's
libsyntax
crate. Essentially, it's a fancy macro, that
rewrites the crate in 3 steps:
Step 1: Re-Exporting
As mentioned earlier, tests can exist inside private modules, so we need a
way of exposing them to the main function, without breaking any existing
code. To that end, libsyntax
will create local modules called
__test_reexports
that recursively reexport tests. This expansion translates
the above example into:
mod my_priv_mod {
fn my_priv_func() -> bool {}
pub fn test_priv_func() {
assert!(my_priv_func());
}
pub mod __test_reexports {
pub use super::test_priv_func;
}
}
Now, our test can be accessed as
my_priv_mod::__test_reexports::test_priv_func
. For deeper module
structures, __test_reexports
will reexport modules that contain tests, so a
test at a::b::my_test
becomes
a::__test_reexports::b::__test_reexports::my_test
. While this process seems
pretty safe, what happens if there is an existing __test_reexports
module?
The answer: nothing.
To explain, we need to understand how the AST represents
identifiers. The name of every function, variable, module, etc. is
not stored as a string, but rather as an opaque Symbol which is
essentially an ID number for each identifier. The compiler keeps a separate
hashtable that allows us to recover the human-readable name of a Symbol when
necessary (such as when printing a syntax error). When the compiler generates
the __test_reexports
module, it generates a new Symbol for the identifier,
so while the compiler-generated __test_reexports
may share a name with your
hand-written one, it will not share a Symbol. This technique prevents name
collision during code generation and is the foundation of Rust's macro
hygiene.
Step 2: Harness Generation
Now that our tests are accessible from the root of our crate, we need to do
something with them. libsyntax
generates a module like so:
#[main]
pub fn main() {
extern crate test;
test::test_main_static(&[&path::to::test1, /*...*/]);
}
where path::to::test1
is a constant of type test::TestDescAndFn
.
While this transformation is simple, it gives us a lot of insight into how
tests are actually run. The tests are aggregated into an array and passed to
a test runner called test_main_static
. We'll come back to exactly what
TestDescAndFn
is, but for now, the key takeaway is that there is a crate
called test
that is part of Rust core, that implements all of the
runtime for testing. test
's interface is unstable, so the only stable way
to interact with it is through the #[test]
macro.
Step 3: Test Object Generation
If you've written tests in Rust before, you may be familiar with some of the
optional attributes available on test functions. For example, a test can be
annotated with #[should_panic]
if we expect the test to cause a panic. It
looks something like this:
#[test]
#[should_panic]
fn foo() {
panic!("intentional");
}
This means our tests are more than just simple functions, they have
configuration information as well. test
encodes this configuration data
into a struct called TestDesc
. For each test function in a
crate, libsyntax
will parse its attributes and generate a TestDesc
instance. It then combines the TestDesc
and test function into the
predictably named TestDescAndFn
struct, that test_main_static
operates
on. For a given test, the generated TestDescAndFn
instance looks like so:
self::test::TestDescAndFn{
desc: self::test::TestDesc{
name: self::test::StaticTestName("foo"),
ignore: false,
should_panic: self::test::ShouldPanic::Yes,
allow_fail: false,
},
testfn: self::test::StaticTestFn(||
self::test::assert_test_result(::crate::__test_reexports::foo())),
}
Once we've constructed an array of these test objects, they're passed to the test runner via the harness generated in step 2.
Inspecting the generated code
On nightly rust, there's an unstable flag called unpretty
that you can use
to print out the module source after macro expansion:
$ rustc my_mod.rs -Z unpretty=hir
Macro expansion
Macro expansion happens during parsing. rustc
has two parsers, in fact: the
normal Rust parser, and the macro parser. During the parsing phase, the normal
Rust parser will set aside the contents of macros and their invocations. Later,
before name resolution, macros are expanded using these portions of the code.
The macro parser, in turn, may call the normal Rust parser when it needs to
bind a metavariable (e.g. $my_expr
) while parsing the contents of a macro
invocation. The code for macro expansion is in
src/libsyntax/ext/tt/
. This chapter aims to explain how macro
expansion works.
Example
It's helpful to have an example to refer to. For the remainder of this chapter, whenever we refer to the "example definition", we mean the following:
macro_rules! printer {
(print $mvar:ident) => {
println!("{}", $mvar);
}
(print twice $mvar:ident) => {
println!("{}", $mvar);
println!("{}", $mvar);
}
}
$mvar
is called a metavariable. Unlike normal variables, rather than
binding to a value in a computation, a metavariable binds at compile time to
a tree of tokens. A token is a single "unit" of the grammar, such as an
identifier (e.g. foo
) or punctuation (e.g. =>
). There are also other
special tokens, such as EOF
, which indicates that there are no more tokens.
Token trees resulting from paired parentheses-like characters ((
...)
,
[
...]
, and {
...}
) – they include the open and close and all the tokens
in between (we do require that parentheses-like characters be balanced). Having
macro expansion operate on token streams rather than the raw bytes of a source
file abstracts away a lot of complexity. The macro expander (and much of the
rest of the compiler) doesn't really care that much about the exact line and
column of some syntactic construct in the code; it cares about what constructs
are used in the code. Using tokens allows us to care about what without
worrying about where. For more information about tokens, see the
Parsing chapter of this book.
Whenever we refer to the "example invocation", we mean the following snippet:
printer!(print foo); // Assume `foo` is a variable defined somewhere else...
The process of expanding the macro invocation into the syntax tree
println!("{}", foo)
and then expanding that into a call to Display::fmt
is
called macro expansion, and it is the topic of this chapter.
The macro parser
There are two parts to macro expansion: parsing the definition and parsing the invocations. Interestingly, both are done by the macro parser.
Basically, the macro parser is like an NFA-based regex parser. It uses an
algorithm similar in spirit to the Earley parsing
algorithm. The macro parser is
defined in src/libsyntax/ext/tt/macro_parser.rs
.
The interface of the macro parser is as follows (this is slightly simplified):
fn parse(
sess: ParserSession,
tts: TokenStream,
ms: &[TokenTree]
) -> NamedParseResult
In this interface:
sess
is a "parsing session", which keeps track of some metadata. Most notably, this is used to keep track of errors that are generated so they can be reported to the user.tts
is a stream of tokens. The macro parser's job is to consume the raw stream of tokens and output a binding of metavariables to corresponding token trees.ms
a matcher. This is a sequence of token trees that we want to matchtts
against.
In the analogy of a regex parser, tts
is the input and we are matching it
against the pattern ms
. Using our examples, tts
could be the stream of
tokens containing the inside of the example invocation print foo
, while ms
might be the sequence of token (trees) print $mvar:ident
.
The output of the parser is a NamedParseResult
, which indicates which of
three cases has occurred:
- Success:
tts
matches the given matcherms
, and we have produced a binding from metavariables to the corresponding token trees. - Failure:
tts
does not matchms
. This results in an error message such as "No rule expected token blah". - Error: some fatal error has occurred in the parser. For example, this happens if there are more than one pattern match, since that indicates the macro is ambiguous.
The full interface is defined here.
The macro parser does pretty much exactly the same as a normal regex parser with
one exception: in order to parse different types of metavariables, such as
ident
, block
, expr
, etc., the macro parser must sometimes call back to the
normal Rust parser.
As mentioned above, both definitions and invocations of macros are parsed using
the macro parser. This is extremely non-intuitive and self-referential. The code
to parse macro definitions is in
src/libsyntax/ext/tt/macro_rules.rs
. It defines the pattern for
matching for a macro definition as $( $lhs:tt => $rhs:tt );+
. In other words,
a macro_rules
definition should have in its body at least one occurrence of a
token tree followed by =>
followed by another token tree. When the compiler
comes to a macro_rules
definition, it uses this pattern to match the two token
trees per rule in the definition of the macro using the macro parser itself.
In our example definition, the metavariable $lhs
would match the patterns of
both arms: (print $mvar:ident)
and (print twice $mvar:ident)
. And $rhs
would match the bodies of both arms: { println!("{}", $mvar); }
and { println!("{}", $mvar); println!("{}", $mvar); }
. The parser would keep this
knowledge around for when it needs to expand a macro invocation.
When the compiler comes to a macro invocation, it parses that invocation using
the same NFA-based macro parser that is described above. However, the matcher
used is the first token tree ($lhs
) extracted from the arms of the macro
definition. Using our example, we would try to match the token stream print foo
from the invocation against the matchers print $mvar:ident
and print twice $mvar:ident
that we previously extracted from the definition. The
algorithm is exactly the same, but when the macro parser comes to a place in the
current matcher where it needs to match a non-terminal (e.g. $mvar:ident
),
it calls back to the normal Rust parser to get the contents of that
non-terminal. In this case, the Rust parser would look for an ident
token,
which it finds (foo
) and returns to the macro parser. Then, the macro parser
proceeds in parsing as normal. Also, note that exactly one of the matchers from
the various arms should match the invocation; if there is more than one match,
the parse is ambiguous, while if there are no matches at all, there is a syntax
error.
For more information about the macro parser's implementation, see the comments
in src/libsyntax/ext/tt/macro_parser.rs
.
Hygiene
If you have ever used C/C++ preprocessor macros, you know that there are some annoying and hard-to-debug gotchas! For example, consider the following C code:
#define DEFINE_FOO struct Bar {int x;}; struct Foo {Bar bar;};
// Then, somewhere else
struct Bar {
...
};
DEFINE_FOO
Most people avoid writing C like this – and for good reason: it doesn't
compile. The struct Bar
defined by the macro clashes names with the struct Bar
defined in the code. Consider also the following example:
#define DO_FOO(x) {\
int y = 0;\
foo(x, y);\
}
// Then elsewhere
int y = 22;
DO_FOO(y);
Do you see the problem? We wanted to generate a call foo(22, 0)
, but instead
we got foo(0, 0)
because the macro defined its own y
!
These are both examples of macro hygiene issues. Hygiene relates to how to handle names defined within a macro. In particular, a hygienic macro system prevents errors due to names introduced within a macro. Rust macros are hygienic in that they do not allow one to write the sorts of bugs above.
At a high level, hygiene within the rust compiler is accomplished by keeping track of the context where a name is introduced and used. We can then disambiguate names based on that context. Future iterations of the macro system will allow greater control to the macro author to use that context. For example, a macro author may want to introduce a new name to the context where the macro was called. Alternately, the macro author may be defining a variable for use only within the macro (i.e. it should not be visible outside the macro).
In rustc, this "context" is tracked via Span
s.
TODO: what is call-site hygiene? what is def-site hygiene?
TODO
Procedural Macros
TODO
Custom Derive
TODO
TODO: maybe something about macros 2.0?
Name resolution
The name resolution is a two-phase process. In the first phase, which runs
during macro expansion, we build a tree of modules and resolve imports. Macro
expansion and name resolution communicate with each other via the Resolver
trait, defined in libsyntax
.
The input to the second phase is the syntax tree, produced by parsing input files and expanding macros. This phase produces links from all the names in the source to relevant places where the name was introduced. It also generates helpful error messages, like typo suggestions, traits to import or lints about unused items.
A successful run of the second phase (Resolver::resolve_crate
) creates kind
of an index the rest of the compilation may use to ask about the present names
(through the hir::lowering::Resolver
interface).
The name resolution lives in the librustc_resolve
crate, with the meat in
lib.rs
and some helpers or symbol-type specific logic in the other modules.
Namespaces
Different kind of symbols live in different namespaces ‒ eg. types don't clash with variables. This usually doesn't happen, because variables start with lower-case letter while types with upper case one, but this is only a convention. This is legal Rust code that'll compile (with warnings):
# #![allow(unused_variables)] #fn main() { type x = u32; let x: x = 1; let y: x = 2; // See? x is still a type here. #}
To cope with this, and with slightly different scoping rules for these namespaces, the resolver keeps them separated and builds separate structures for them.
In other words, when the code talks about namespaces, it doesn't mean the module hierarchy, it's types vs. values vs. macros.
Scopes and ribs
A name is visible only in certain area in the source code. This forms a hierarchical structure, but not necessarily a simple one ‒ if one scope is part of another, it doesn't mean the name visible in the outer one is also visible in the inner one, or that it refers to the same thing.
To cope with that, the compiler introduces the concept of Ribs. This is abstraction of a scope. Every time the set of visible names potentially changes, a new rib is pushed onto a stack. The places where this can happen includes for example:
- The obvious places ‒ curly braces enclosing a block, function boundaries, modules.
- Introducing a let binding ‒ this can shadow another binding with the same name.
- Macro expansion border ‒ to cope with macro hygiene.
When searching for a name, the stack of ribs is traversed from the innermost outwards. This helps to find the closest meaning of the name (the one not shadowed by anything else). The transition to outer rib may also change the rules what names are usable ‒ if there are nested functions (not closures), the inner one can't access parameters and local bindings of the outer one, even though they should be visible by ordinary scoping rules. An example:
# #![allow(unused_variables)] #fn main() { fn do_something<T: Default>(val: T) { // <- New rib in both types and values (1) // `val` is accessible, as is the helper function // `T` is accessible let helper = || { // New rib on `helper` (2) and another on the block (3) // `val` is accessible here }; // End of (3) // `val` is accessible, `helper` variable shadows `helper` function fn helper() { // <- New rib in both types and values (4) // `val` is not accessible here, (4) is not transparent for locals) // `T` is not accessible here } // End of (4) let val = T::default(); // New rib (5) // `val` is the variable, not the parameter here } // End of (5), (2) and (1) #}
Because the rules for different namespaces are a bit different, each namespace has its own independent rib stack that is constructed in parallel to the others. In addition, there's also a rib stack for local labels (eg. names of loops or blocks), which isn't a full namespace in its own right.
Overall strategy
To perform the name resolution of the whole crate, the syntax tree is traversed top-down and every encountered name is resolved. This works for most kinds of names, because at the point of use of a name it is already introduced in the Rib hierarchy.
There are some exceptions to this. Items are bit tricky, because they can be used even before encountered ‒ therefore every block needs to be first scanned for items to fill in its Rib.
Other, even more problematic ones, are imports which need recursive fixed-point resolution and macros, that need to be resolved and expanded before the rest of the code can be processed.
Therefore, the resolution is performed in multiple stages.
TODO:
This is a result of the first pass of learning the code. It is definitely incomplete and not detailed enough. It also might be inaccurate in places. Still, it probably provides useful first guidepost to what happens in there.
- What exactly does it link to and how is that published and consumed by following stages of compilation?
- Who calls it and how it is actually used.
- Is it a pass and then the result is only used, or can it be computed incrementally (eg. for RLS)?
- The overall strategy description is a bit vague.
- Where does the name
Rib
come from? - Does this thing have its own tests, or is it tested only as part of some e2e testing?
The HIR
The HIR – "High-Level Intermediate Representation" – is the primary IR used
in most of rustc. It is a compiler-friendly representation of the abstract
syntax tree (AST) that is generated after parsing, macro expansion, and name
resolution (see Lowering for how the HIR is created).
Many parts of HIR resemble Rust surface syntax quite closely, with
the exception that some of Rust's expression forms have been desugared away.
For example, for
loops are converted into a loop
and do not appear in
the HIR. This makes HIR more amenable to analysis than a normal AST.
This chapter covers the main concepts of the HIR.
You can view the HIR representation of your code by passing the
-Zunpretty=hir-tree
flag to rustc:
> cargo rustc -- -Zunpretty=hir-tree
Out-of-band storage and the Crate
type
The top-level data-structure in the HIR is the Crate
, which stores
the contents of the crate currently being compiled (we only ever
construct HIR for the current crate). Whereas in the AST the crate
data structure basically just contains the root module, the HIR
Crate
structure contains a number of maps and other things that
serve to organize the content of the crate for easier access.
For example, the contents of individual items (e.g. modules,
functions, traits, impls, etc) in the HIR are not immediately
accessible in the parents. So, for example, if there is a module item
foo
containing a function bar()
:
# #![allow(unused_variables)] #fn main() { mod foo { fn bar() { } } #}
then in the HIR the representation of module foo
(the Mod
struct) would only have the ItemId
I
of bar()
. To get the
details of the function bar()
, we would lookup I
in the
items
map.
One nice result from this representation is that one can iterate over all items in the crate by iterating over the key-value pairs in these maps (without the need to trawl through the whole HIR). There are similar maps for things like trait items and impl items, as well as "bodies" (explained below).
The other reason to set up the representation this way is for better
integration with incremental compilation. This way, if you gain access
to an &hir::Item
(e.g. for the mod foo
), you do not immediately
gain access to the contents of the function bar()
. Instead, you only
gain access to the id for bar()
, and you must invoke some
function to lookup the contents of bar()
given its id; this gives
the compiler a chance to observe that you accessed the data for
bar()
, and then record the dependency.
Identifiers in the HIR
Most of the code that has to deal with things in HIR tends not to carry around references into the HIR, but rather to carry around identifier numbers (or just "ids"). Right now, you will find four sorts of identifiers in active use:
DefId
, which primarily names "definitions" or top-level items.- You can think of a
DefId
as being shorthand for a very explicit and complete path, likestd::collections::HashMap
. However, these paths are able to name things that are not nameable in normal Rust (e.g. impls), and they also include extra information about the crate (such as its version number, as two versions of the same crate can co-exist). - A
DefId
really consists of two parts, aCrateNum
(which identifies the crate) and aDefIndex
(which indexes into a list of items that is maintained per crate).
- You can think of a
HirId
, which combines the index of a particular item with an offset within that item.BodyId
, this is an identifier that refers to a specific body (definition of a function or constant) in the crate. It is currently effectively a "newtype'd"HirId
.NodeId
, which is an absolute id that identifies a single node in the HIR tree.- While these are still in common use, they are being slowly phased out.
- Since they are absolute within the crate, adding a new node anywhere in the
tree causes the
NodeId
s of all subsequent code in the crate to change. This is terrible for incremental compilation, as you can perhaps imagine.
The HIR Map
Most of the time when you are working with the HIR, you will do so via
the HIR Map, accessible in the tcx via tcx.hir_map
(and defined in
the hir::map
module). The HIR map contains a number of methods to
convert between IDs of various kinds and to lookup data associated
with an HIR node.
For example, if you have a DefId
, and you would like to convert it
to a NodeId
, you can use
tcx.hir.as_local_node_id(def_id)
. This returns
an Option<NodeId>
– this will be None
if the def-id refers to
something outside of the current crate (since then it has no HIR
node), but otherwise returns Some(n)
where n
is the node-id of the
definition.
Similarly, you can use tcx.hir.find(n)
to lookup the node for a
NodeId
. This returns a Option<Node<'tcx>>
, where Node
is an enum
defined in the map; by matching on this you can find out what sort of
node the node-id referred to and also get a pointer to the data
itself. Often, you know what sort of node n
is – e.g. if you know
that n
must be some HIR expression, you can do
tcx.hir.expect_expr(n)
, which will extract and return the
&hir::Expr
, panicking if n
is not in fact an expression.
Finally, you can use the HIR map to find the parents of nodes, via
calls like tcx.hir.get_parent_node(n)
.
HIR Bodies
A hir::Body
represents some kind of executable code, such as the body
of a function/closure or the definition of a constant. Bodies are
associated with an owner, which is typically some kind of item
(e.g. an fn()
or const
), but could also be a closure expression
(e.g. |x, y| x + y
). You can use the HIR map to find the body
associated with a given def-id (maybe_body_owned_by
) or to find
the owner of a body (body_owner_def_id
).
Lowering
The lowering step converts AST to HIR. This means many structures are removed if they are irrelevant for type analysis or similar syntax agnostic analyses. Examples of such structures include but are not limited to
- Parenthesis
- Removed without replacement, the tree structure makes order explicit
for
loops andwhile (let)
loops- Converted to
loop
+match
and somelet
bindings
- Converted to
if let
- Converted to
match
- Converted to
- Universal
impl Trait
- Converted to generic arguments (but with some flags, to know that the user didn't write them)
- Existential
impl Trait
- Converted to a virtual
existential type
declaration
- Converted to a virtual
Lowering needs to uphold several invariants in order to not trigger the
sanity checks in src/librustc/hir/map/hir_id_validator.rs
:
- A
HirId
must be used if created. So if you use thelower_node_id
, you must use the resultingNodeId
orHirId
(either is fine, since anyNodeId
s in theHIR
are checked for existingHirId
s) - Lowering a
HirId
must be done in the scope of the owning item. This means you need to usewith_hir_id_owner
if you are creating parts of an item other than the one being currently lowered. This happens for example during the lowering of existentialimpl Trait
- A
NodeId
that will be placed into a HIR structure must be lowered, even if itsHirId
is unused. Callinglet _ = self.lower_node_id(node_id);
is perfectly legitimate. - If you are creating new nodes that didn't exist in the
AST
, you must create new ids for them. This is done by calling thenext_id
method, which produces both a newNodeId
as well as automatically lowering it for you so you also get theHirId
.
If you are creating new DefId
s, since each DefId
needs to have a
corresponding NodeId
, it is advisable to add these NodeId
s to the
AST
so you don't have to generate new ones during lowering. This has
the advantage of creating a way to find the DefId
of something via its
NodeId
. If lowering needs this DefId
in multiple places, you can't
generate a new NodeId
in all those places because you'd also get a new
DefId
then. With a NodeId
from the AST
this is not an issue.
Having the NodeId
also allows the DefCollector
to generate the DefId
s
instead of lowering having to do it on the fly. Centralizing the DefId
generation in one place makes it easier to refactor and reason about.
HIR Debugging
The -Zunpretty=hir-tree
flag will dump out the HIR.
If you are trying to correlate NodeId
s or DefId
s with source code, the
--pretty expanded,identified
flag may be useful.
TODO: anything else?
The ty
module: representing types
The ty
module defines how the Rust compiler represents types
internally. It also defines the typing context (tcx
or TyCtxt
),
which is the central data structure in the compiler.
The tcx and how it uses lifetimes
The tcx
("typing context") is the central data structure in the
compiler. It is the context that you use to perform all manner of
queries. The struct TyCtxt
defines a reference to this shared context:
tcx: TyCtxt<'a, 'gcx, 'tcx>
// -- ---- ----
// | | |
// | | innermost arena lifetime (if any)
// | "global arena" lifetime
// lifetime of this reference
As you can see, the TyCtxt
type takes three lifetime parameters.
These lifetimes are perhaps the most complex thing to understand about
the tcx. During Rust compilation, we allocate most of our memory in
arenas, which are basically pools of memory that get freed all at
once. When you see a reference with a lifetime like 'tcx
or 'gcx
,
you know that it refers to arena-allocated data (or data that lives as
long as the arenas, anyhow).
We use two distinct levels of arenas. The outer level is the "global arena". This arena lasts for the entire compilation: so anything you allocate in there is only freed once compilation is basically over (actually, when we shift to executing LLVM).
To reduce peak memory usage, when we do type inference, we also use an inner level of arena. These arenas get thrown away once type inference is over. This is done because type inference generates a lot of "throw-away" types that are not particularly interesting after type inference completes, so keeping around those allocations would be wasteful.
Often, we wish to write code that explicitly asserts that it is not
taking place during inference. In that case, there is no "local"
arena, and all the types that you can access are allocated in the
global arena. To express this, the idea is to use the same lifetime
for the 'gcx
and 'tcx
parameters of TyCtxt
. Just to be a touch
confusing, we tend to use the name 'tcx
in such contexts. Here is an
example:
fn not_in_inference<'a, 'tcx>(tcx: TyCtxt<'a, 'tcx, 'tcx>, def_id: DefId) {
// ---- ----
// Using the same lifetime here asserts
// that the innermost arena accessible through
// this reference *is* the global arena.
}
In contrast, if we want to code that can be usable during type inference, then
you need to declare a distinct 'gcx
and 'tcx
lifetime parameter:
fn maybe_in_inference<'a, 'gcx, 'tcx>(tcx: TyCtxt<'a, 'gcx, 'tcx>, def_id: DefId) {
// ---- ----
// Using different lifetimes here means that
// the innermost arena *may* be distinct
// from the global arena (but doesn't have to be).
}
Allocating and working with types
Rust types are represented using the Ty<'tcx>
defined in the ty
module (not to be confused with the Ty
struct from the HIR). This
is in fact a simple type alias for a reference with 'tcx
lifetime:
pub type Ty<'tcx> = &'tcx TyS<'tcx>;
You can basically ignore the TyS
struct – you will basically never
access it explicitly. We always pass it by reference using the
Ty<'tcx>
alias – the only exception I think is to define inherent
methods on types. Instances of TyS
are only ever allocated in one of
the rustc arenas (never e.g. on the stack).
One common operation on types is to match and see what kinds of
types they are. This is done by doing match ty.sty
, sort of like this:
fn test_type<'tcx>(ty: Ty<'tcx>) {
match ty.sty {
ty::TyArray(elem_ty, len) => { ... }
...
}
}
The sty
field (the origin of this name is unclear to me; perhaps
structural type?) is of type TyKind<'tcx>
, which is an enum
defining all of the different kinds of types in the compiler.
N.B. inspecting the
sty
field on types during type inference can be risky, as there may be inference variables and other things to consider, or sometimes types are not yet known that will become known later.).
To allocate a new type, you can use the various mk_
methods defined
on the tcx
. These have names that correpond mostly to the various kinds
of type variants. For example:
let array_ty = tcx.mk_array(elem_ty, len * 2);
These methods all return a Ty<'tcx>
– note that the lifetime you
get back is the lifetime of the innermost arena that this tcx
has
access to. In fact, types are always canonicalized and interned (so we
never allocate exactly the same type twice) and are always allocated
in the outermost arena where they can be (so, if they do not contain
any inference variables or other "temporary" types, they will be
allocated in the global arena). However, the lifetime 'tcx
is always
a safe approximation, so that is what you get back.
NB. Because types are interned, it is possible to compare them for equality efficiently using
==
– however, this is almost never what you want to do unless you happen to be hashing and looking for duplicates. This is because often in Rust there are multiple ways to represent the same type, particularly once inference is involved. If you are going to be testing for type equality, you probably need to start looking into the inference code to do it right.
You can also find various common types in the tcx
itself by accessing
tcx.types.bool
, tcx.types.char
, etc (see CommonTypes
for more).
Beyond types: other kinds of arena-allocated data structures
In addition to types, there are a number of other arena-allocated data structures that you can allocate, and which are found in this module. Here are a few examples:
Substs
, allocated withmk_substs
– this will intern a slice of types, often used to specify the values to be substituted for generics (e.g.HashMap<i32, u32>
would be represented as a slice&'tcx [tcx.types.i32, tcx.types.u32]
).TraitRef
, typically passed by value – a trait reference consists of a reference to a trait along with its various type parameters (includingSelf
), likei32: Display
(here, the def-id would reference theDisplay
trait, and the substs would containi32
).Predicate
defines something the trait system has to prove (seetraits
module).
Import conventions
Although there is no hard and fast rule, the ty
module tends to be used like
so:
use ty::{self, Ty, TyCtxt};
In particular, since they are so common, the Ty
and TyCtxt
types
are imported directly. Other types are often referenced with an
explicit ty::
prefix (e.g. ty::TraitRef<'tcx>
). But some modules
choose to import a larger or smaller set of names explicitly.
Kinds
A ty::subst::Kind<'tcx>
represents some entity in the type system: a type
(Ty<'tcx>
), lifetime (ty::Region<'tcx>
) or constant (ty::Const<'tcx>
).
Kind
is used to perform substitutions of generic parameters for concrete
arguments, such as when calling a function with generic parameters explicitly
with type arguments. Substitutions are represented using the
Subst
type as described below.
Subst
ty::subst::Subst<'tcx>
is intuitively simply a slice of Kind<'tcx>
s,
acting as an ordered list of substitutions from generic parameters to
concrete arguments (such as types, lifetimes and consts).
For example, given a HashMap<K, V>
with two type parameters, K
and V
, an
instantiation of the parameters, for example HashMap<i32, u32>
, would be
represented by the substitution &'tcx [tcx.types.i32, tcx.types.u32]
.
Subst
provides various convenience methods to instantiant substitutions
given item definitions, which should generally be used rather than explicitly
constructing such substitution slices.
Kind
The actual Kind
struct is optimised for space, storing the type, lifetime or
const as an interned pointer containing a tag identifying its kind (in the
lowest 2 bits). Unless you are working with the Subst
implementation
specifically, you should generally not have to deal with Kind
and instead
make use of the safe UnpackedKind
abstraction.
UnpackedKind
As Kind
itself is not type-safe, the UnpackedKind
enum provides a more
convenient and safe interface for dealing with kinds. An UnpackedKind
can
be converted to a raw Kind
using Kind::from()
(or simply .into()
when
the context is clear). As mentioned earlier, substition lists store raw
Kind
s, so before dealing with them, it is preferable to convert them to
UnpackedKind
s first. This is done by calling the .unpack()
method.
// An example of unpacking and packing a kind.
fn deal_with_kind<'tcx>(kind: Kind<'tcx>) -> Kind<'tcx> {
// Unpack a raw `Kind` to deal with it safely.
let new_kind: UnpackedKind<'tcx> = match kind.unpack() {
UnpackedKind::Type(ty) => { /* ... */ }
UnpackedKind::Lifetime(lt) => { /* ... */ }
UnpackedKind::Const(ct) => { /* ... */ }
};
// Pack the `UnpackedKind` to store it in a substitution list.
new_kind.into()
}
Type inference
Type inference is the process of automatic detection of the type of an expression.
It is what allows Rust to work with fewer or no type annotations, making things easier for users:
fn main() {
let mut things = vec![];
things.push("thing")
}
Here, the type of things
is inferenced to be &str
because that's the value
we push into things
.
The type inference is based on the standard Hindley-Milner (HM) type inference algorithm, but extended in various way to accommodate subtyping, region inference, and higher-ranked types.
A note on terminology
We use the notation ?T
to refer to inference variables, also called
existential variables.
We use the terms "region" and "lifetime" interchangeably. Both refer to
the 'a
in &'a T
.
The term "bound region" refers to a region that is bound in a function
signature, such as the 'a
in for<'a> fn(&'a u32)
. A region is
"free" if it is not bound.
Creating an inference context
You create and "enter" an inference context by doing something like the following:
tcx.infer_ctxt().enter(|infcx| {
// Use the inference context `infcx` here.
})
Each inference context creates a short-lived type arena to store the
fresh types and things that it will create, as described in the
chapter on the ty
module. This arena is created by the enter
function and disposed of after it returns.
Within the closure, infcx
has the type InferCtxt<'cx, 'gcx, 'tcx>
for some fresh 'cx
and 'tcx
– the latter corresponds to the lifetime of
this temporary arena, and the 'cx
is the lifetime of the InferCtxt
itself.
(Again, see the ty
chapter for more details on this setup.)
The tcx.infer_ctxt
method actually returns a builder, which means
there are some kinds of configuration you can do before the infcx
is
created. See InferCtxtBuilder
for more information.
Inference variables
The main purpose of the inference context is to house a bunch of inference variables – these represent types or regions whose precise value is not yet known, but will be uncovered as we perform type-checking.
If you're familiar with the basic ideas of unification from H-M type systems, or logic languages like Prolog, this is the same concept. If you're not, you might want to read a tutorial on how H-M type inference works, or perhaps this blog post on unification in the Chalk project.
All told, the inference context stores four kinds of inference variables as of this writing:
- Type variables, which come in three varieties:
- General type variables (the most common). These can be unified with any type.
- Integral type variables, which can only be unified with an integral type,
and arise from an integer literal expression like
22
. - Float type variables, which can only be unified with a float type, and
arise from a float literal expression like
22.0
.
- Region variables, which represent lifetimes, and arise all over the place.
All the type variables work in much the same way: you can create a new
type variable, and what you get is Ty<'tcx>
representing an
unresolved type ?T
. Then later you can apply the various operations
that the inferencer supports, such as equality or subtyping, and it
will possibly instantiate (or bind) that ?T
to a specific
value as a result.
The region variables work somewhat differently, and are described below in a separate section.
Enforcing equality / subtyping
The most basic operations you can perform in the type inferencer is
equality, which forces two types T
and U
to be the same. The
recommended way to add an equality constraint is to use the at
method, roughly like so:
infcx.at(...).eq(t, u);
The first at()
call provides a bit of context, i.e. why you are
doing this unification, and in what environment, and the eq
method
performs the actual equality constraint.
When you equate things, you force them to be precisely equal. Equating
returns an InferResult
– if it returns Err(err)
, then equating
failed, and the enclosing TypeError
will tell you what went wrong.
The success case is perhaps more interesting. The "primary" return
type of eq
is ()
– that is, when it succeeds, it doesn't return a
value of any particular interest. Rather, it is executed for its
side-effects of constraining type variables and so forth. However, the
actual return type is not ()
, but rather InferOk<()>
. The
InferOk
type is used to carry extra trait obligations – your job is
to ensure that these are fulfilled (typically by enrolling them in a
fulfillment context). See the trait chapter for more background on that.
You can similarly enforce subtyping through infcx.at(..).sub(..)
. The same
basic concepts as above apply.
"Trying" equality
Sometimes you would like to know if it is possible to equate two
types without error. You can test that with infcx.can_eq
(or
infcx.can_sub
for subtyping). If this returns Ok
, then equality
is possible – but in all cases, any side-effects are reversed.
Be aware, though, that the success or failure of these methods is always
modulo regions. That is, two types &'a u32
and &'b u32
will
return Ok
for can_eq
, even if 'a != 'b
. This falls out from the
"two-phase" nature of how we solve region constraints.
Snapshots
As described in the previous section on can_eq
, often it is useful
to be able to do a series of operations and then roll back their
side-effects. This is done for various reasons: one of them is to be
able to backtrack, trying out multiple possibilities before settling
on which path to take. Another is in order to ensure that a series of
smaller changes take place atomically or not at all.
To allow for this, the inference context supports a snapshot
method.
When you call it, it will start recording changes that occur from the
operations you perform. When you are done, you can either invoke
rollback_to
, which will undo those changes, or else confirm
, which
will make the permanent. Snapshots can be nested as long as you follow
a stack-like discipline.
Rather than use snapshots directly, it is often helpful to use the
methods like commit_if_ok
or probe
that encapsulate higher-level
patterns.
Subtyping obligations
One thing worth discussing is subtyping obligations. When you force
two types to be a subtype, like ?T <: i32
, we can often convert those
into equality constraints. This follows from Rust's rather limited notion
of subtyping: so, in the above case, ?T <: i32
is equivalent to ?T = i32
.
However, in some cases we have to be more careful. For example, when
regions are involved. So if you have ?T <: &'a i32
, what we would do
is to first "generalize" &'a i32
into a type with a region variable:
&'?b i32
, and then unify ?T
with that (?T = &'?b i32
). We then
relate this new variable with the original bound:
&'?b i32 <: &'a i32
This will result in a region constraint (see below) of '?b: 'a
.
One final interesting case is relating two unbound type variables,
like ?T <: ?U
. In that case, we can't make progress, so we enqueue
an obligation Subtype(?T, ?U)
and return it via the InferOk
mechanism. You'll have to try again when more details about ?T
or
?U
are known.
Region constraints
Regions are inferenced somewhat differently from types. Rather than eagerly unifying things, we simply collect constraints as we go, but make (almost) no attempt to solve regions. These constraints have the form of an "outlives" constraint:
'a: 'b
Actually the code tends to view them as a subregion relation, but it's the same idea:
'b <= 'a
(There are various other kinds of constraints, such as "verifys"; see
the region_constraints
module for details.)
There is one case where we do some amount of eager unification. If you have an equality constraint between two regions
'a = 'b
we will record that fact in a unification table. You can then use
opportunistic_resolve_var
to convert 'b
to 'a
(or vice
versa). This is sometimes needed to ensure termination of fixed-point
algorithms.
Extracting region constraints
Ultimately, region constraints are only solved at the very end of type-checking, once all other constraints are known. There are two ways to solve region constraints right now: lexical and non-lexical. Eventually there will only be one.
To solve lexical region constraints, you invoke
resolve_regions_and_report_errors
. This "closes" the region
constraint process and invoke the lexical_region_resolve
code. Once
this is done, any further attempt to equate or create a subtyping
relationship will yield an ICE.
Non-lexical region constraints are not handled within the inference
context. Instead, the NLL solver (actually, the MIR type-checker)
invokes take_and_reset_region_constraints
periodically. This
extracts all of the outlives constraints from the region solver, but
leaves the set of variables intact. This is used to get just the
region constraints that resulted from some particular point in the
program, since the NLL solver needs to know not just what regions
were subregions but where. Finally, the NLL solver invokes
take_region_var_origins
, which "closes" the region constraint
process in the same way as normal solving.
Lexical region resolution
Lexical region resolution is done by initially assigning each region variable to an empty value. We then process each outlives constraint repeatedly, growing region variables until a fixed-point is reached. Region variables can be grown using a least-upper-bound relation on the region lattice in a fairly straightforward fashion.
Trait resolution (old-style)
This chapter describes the general process of trait resolution and points out some non-obvious things.
Note: This chapter (and its subchapters) describe how the trait solver currently works. However, we are in the process of designing a new trait solver. If you'd prefer to read about that, see this traits chapter.
Major concepts
Trait resolution is the process of pairing up an impl with each reference to a trait. So, for example, if there is a generic function like:
fn clone_slice<T:Clone>(x: &[T]) -> Vec<T> { ... }
and then a call to that function:
let v: Vec<isize> = clone_slice(&[1, 2, 3])
it is the job of trait resolution to figure out whether there exists an impl of
(in this case) isize : Clone
.
Note that in some cases, like generic functions, we may not be able to
find a specific impl, but we can figure out that the caller must
provide an impl. For example, consider the body of clone_slice
:
fn clone_slice<T:Clone>(x: &[T]) -> Vec<T> {
let mut v = Vec::new();
for e in &x {
v.push((*e).clone()); // (*)
}
}
The line marked (*)
is only legal if T
(the type of *e
)
implements the Clone
trait. Naturally, since we don't know what T
is, we can't find the specific impl; but based on the bound T:Clone
,
we can say that there exists an impl which the caller must provide.
We use the term obligation to refer to a trait reference in need of an impl. Basically, the trait resolution system resolves an obligation by proving that an appropriate impl does exist.
During type checking, we do not store the results of trait selection. We simply wish to verify that trait selection will succeed. Then later, at trans time, when we have all concrete types available, we can repeat the trait selection to choose an actual implementation, which will then be generated in the output binary.
Overview
Trait resolution consists of three major parts:
-
Selection: Deciding how to resolve a specific obligation. For example, selection might decide that a specific obligation can be resolved by employing an impl which matches the
Self
type, or by using a parameter bound (e.g.T: Trait
). In the case of an impl, selecting one obligation can create nested obligations because of where clauses on the impl itself. It may also require evaluating those nested obligations to resolve ambiguities. -
Fulfillment: The fulfillment code is what tracks that obligations are completely fulfilled. Basically it is a worklist of obligations to be selected: once selection is successful, the obligation is removed from the worklist and any nested obligations are enqueued.
-
Coherence: The coherence checks are intended to ensure that there are never overlapping impls, where two impls could be used with equal precedence.
Selection
Selection is the process of deciding whether an obligation can be
resolved and, if so, how it is to be resolved (via impl, where clause, etc).
The main interface is the select()
function, which takes an obligation
and returns a SelectionResult
. There are three possible outcomes:
-
Ok(Some(selection))
– yes, the obligation can be resolved, andselection
indicates how. If the impl was resolved via an impl, thenselection
may also indicate nested obligations that are required by the impl. -
Ok(None)
– we are not yet sure whether the obligation can be resolved or not. This happens most commonly when the obligation contains unbound type variables. -
Err(err)
– the obligation definitely cannot be resolved due to a type error or because there are no impls that could possibly apply.
The basic algorithm for selection is broken into two big phases: candidate assembly and confirmation.
Note that because of how lifetime inference works, it is not possible to give back immediate feedback as to whether a unification or subtype relationship between lifetimes holds or not. Therefore, lifetime matching is not considered during selection. This is reflected in the fact that subregion assignment is infallible. This may yield lifetime constraints that will later be found to be in error (in contrast, the non-lifetime-constraints have already been checked during selection and can never cause an error, though naturally they may lead to other errors downstream).
Candidate assembly
Searches for impls/where-clauses/etc that might possibly be used to satisfy the obligation. Each of those is called a candidate. To avoid ambiguity, we want to find exactly one candidate that is definitively applicable. In some cases, we may not know whether an impl/where-clause applies or not – this occurs when the obligation contains unbound inference variables.
The subroutines that decide whether a particular impl/where-clause/etc
applies to a particular obligation are collectively referred to as the
process of matching. At the moment, this amounts to
unifying the Self
types, but in the future we may also recursively
consider some of the nested obligations, in the case of an impl.
TODO: what does "unifying the Self
types" mean? The Self
of the
obligation with that of an impl?
The basic idea for candidate assembly is to do a first pass in which we identify all possible candidates. During this pass, all that we do is try and unify the type parameters. (In particular, we ignore any nested where clauses.) Presuming that this unification succeeds, the impl is added as a candidate.
Once this first pass is done, we can examine the set of candidates. If it is a singleton set, then we are done: this is the only impl in scope that could possibly apply. Otherwise, we can winnow down the set of candidates by using where clauses and other conditions. If this reduced set yields a single, unambiguous entry, we're good to go, otherwise the result is considered ambiguous.
The basic process: Inferring based on the impls we see
This process is easier if we work through some examples. Consider the following trait:
trait Convert<Target> {
fn convert(&self) -> Target;
}
This trait just has one method. It's about as simple as it gets. It
converts from the (implicit) Self
type to the Target
type. If we
wanted to permit conversion between isize
and usize
, we might
implement Convert
like so:
impl Convert<usize> for isize { ... } // isize -> usize
impl Convert<isize> for usize { ... } // usize -> isize
Now imagine there is some code like the following:
let x: isize = ...;
let y = x.convert();
The call to convert will generate a trait reference Convert<$Y> for isize
, where $Y
is the type variable representing the type of
y
. Of the two impls we can see, the only one that matches is
Convert<usize> for isize
. Therefore, we can
select this impl, which will cause the type of $Y
to be unified to
usize
. (Note that while assembling candidates, we do the initial
unifications in a transaction, so that they don't affect one another.)
TODO: The example says we can "select" the impl, but this section is
talking specifically about candidate assembly. Does this mean we can sometimes
skip confirmation? Or is this poor wording?
TODO: Is the unification of $Y
part of trait resolution or type
inference? Or is this not the same type of "inference variable" as in type
inference?
Winnowing: Resolving ambiguities
But what happens if there are multiple impls where all the types unify? Consider this example:
trait Get {
fn get(&self) -> Self;
}
impl<T:Copy> Get for T {
fn get(&self) -> T { *self }
}
impl<T:Get> Get for Box<T> {
fn get(&self) -> Box<T> { Box::new(get_it(&**self)) }
}
What happens when we invoke get_it(&Box::new(1_u16))
, for example? In this
case, the Self
type is Box<u16>
– that unifies with both impls,
because the first applies to all types T
, and the second to all
Box<T>
. In order for this to be unambiguous, the compiler does a winnowing
pass that considers where
clauses
and attempts to remove candidates. In this case, the first impl only
applies if Box<u16> : Copy
, which doesn't hold. After winnowing,
then, we are left with just one candidate, so we can proceed.
where
clauses
Besides an impl, the other major way to resolve an obligation is via a where clause. The selection process is always given a parameter environment which contains a list of where clauses, which are basically obligations that we can assume are satisfiable. We will iterate over that list and check whether our current obligation can be found in that list. If so, it is considered satisfied. More precisely, we want to check whether there is a where-clause obligation that is for the same trait (or some subtrait) and which can match against the obligation.
Consider this simple example:
trait A1 {
fn do_a1(&self);
}
trait A2 : A1 { ... }
trait B {
fn do_b(&self);
}
fn foo<X:A2+B>(x: X) {
x.do_a1(); // (*)
x.do_b(); // (#)
}
In the body of foo
, clearly we can use methods of A1
, A2
, or B
on variable x
. The line marked (*)
will incur an obligation X: A1
,
while the line marked (#)
will incur an obligation X: B
. Meanwhile,
the parameter environment will contain two where-clauses: X : A2
and X : B
.
For each obligation, then, we search this list of where-clauses. The
obligation X: B
trivially matches against the where-clause X: B
.
To resolve an obligation X:A1
, we would note that X:A2
implies that X:A1
.
Confirmation
Confirmation unifies the output type parameters of the trait with the values found in the obligation, possibly yielding a type error.
Suppose we have the following variation of the Convert
example in the
previous section:
trait Convert<Target> {
fn convert(&self) -> Target;
}
impl Convert<usize> for isize { ... } // isize -> usize
impl Convert<isize> for usize { ... } // usize -> isize
let x: isize = ...;
let y: char = x.convert(); // NOTE: `y: char` now!
Confirmation is where an error would be reported because the impl specified
that Target
would be usize
, but the obligation reported char
. Hence the
result of selection would be an error.
Note that the candidate impl is chosen based on the Self
type, but
confirmation is done based on (in this case) the Target
type parameter.
Selection during translation
As mentioned above, during type checking, we do not store the results of trait selection. At trans time, we repeat the trait selection to choose a particular impl for each method call. In this second selection, we do not consider any where-clauses to be in scope because we know that each resolution will resolve to a particular impl.
One interesting twist has to do with nested obligations. In general, in trans, we only need to do a "shallow" selection for an obligation. That is, we wish to identify which impl applies, but we do not (yet) need to decide how to select any nested obligations. Nonetheless, we do currently do a complete resolution, and that is because it can sometimes inform the results of type inference. That is, we do not have the full substitutions in terms of the type variables of the impl available to us, so we must run trait selection to figure everything out.
TODO: is this still talking about trans?
Here is an example:
trait Foo { ... }
impl<U, T:Bar<U>> Foo for Vec<T> { ... }
impl Bar<usize> for isize { ... }
After one shallow round of selection for an obligation like Vec<isize> : Foo
, we would know which impl we want, and we would know that
T=isize
, but we do not know the type of U
. We must select the
nested obligation isize : Bar<U>
to find out that U=usize
.
It would be good to only do just as much nested resolution as necessary. Currently, though, we just do a full resolution.
Higher-ranked trait bounds
One of the more subtle concepts in trait resolution is higher-ranked trait
bounds. An example of such a bound is for<'a> MyTrait<&'a isize>
.
Let's walk through how selection on higher-ranked trait references
works.
Basic matching and placeholder leaks
Suppose we have a trait Foo
:
# #![allow(unused_variables)] #fn main() { trait Foo<X> { fn foo(&self, x: X) { } } #}
Let's say we have a function want_hrtb
that wants a type which
implements Foo<&'a isize>
for any 'a
:
fn want_hrtb<T>() where T : for<'a> Foo<&'a isize> { ... }
Now we have a struct AnyInt
that implements Foo<&'a isize>
for any
'a
:
struct AnyInt;
impl<'a> Foo<&'a isize> for AnyInt { }
And the question is, does AnyInt : for<'a> Foo<&'a isize>
? We want the
answer to be yes. The algorithm for figuring it out is closely related
to the subtyping for higher-ranked types (which is described here
and also in a paper by SPJ. If you wish to understand higher-ranked
subtyping, we recommend you read the paper). There are a few parts:
- Replace bound regions in the obligation with placeholders.
- Match the impl against the placeholder obligation.
- Check for placeholder leaks.
So let's work through our example.
-
The first thing we would do is to replace the bound region in the obligation with a placeholder, yielding
AnyInt : Foo<&'0 isize>
(here'0
represents placeholder region #0). Note that we now have no quantifiers; in terms of the compiler type, this changes from aty::PolyTraitRef
to aTraitRef
. We would then create theTraitRef
from the impl, using fresh variables for it's bound regions (and thus gettingFoo<&'$a isize>
, where'$a
is the inference variable for'a
). -
Next we relate the two trait refs, yielding a graph with the constraint that
'0 == '$a
. -
Finally, we check for placeholder "leaks" – a leak is basically any attempt to relate a placeholder region to another placeholder region, or to any region that pre-existed the impl match. The leak check is done by searching from the placeholder region to find the set of regions that it is related to in any way. This is called the "taint" set. To pass the check, that set must consist solely of itself and region variables from the impl. If the taint set includes any other region, then the match is a failure. In this case, the taint set for
'0
is{'0, '$a}
, and hence the check will succeed.
Let's consider a failure case. Imagine we also have a struct
struct StaticInt;
impl Foo<&'static isize> for StaticInt;
We want the obligation StaticInt : for<'a> Foo<&'a isize>
to be
considered unsatisfied. The check begins just as before. 'a
is
replaced with a placeholder '0
and the impl trait reference is instantiated to
Foo<&'static isize>
. When we relate those two, we get a constraint
like 'static == '0
. This means that the taint set for '0
is {'0, 'static}
, which fails the leak check.
TODO: This is because 'static
is not a region variable but is in the
taint set, right?
Higher-ranked trait obligations
Once the basic matching is done, we get to another interesting topic:
how to deal with impl obligations. I'll work through a simple example
here. Imagine we have the traits Foo
and Bar
and an associated impl:
# #![allow(unused_variables)] #fn main() { trait Foo<X> { fn foo(&self, x: X) { } } trait Bar<X> { fn bar(&self, x: X) { } } impl<X,F> Foo<X> for F where F : Bar<X> { } #}
Now let's say we have a obligation Baz: for<'a> Foo<&'a isize>
and we match
this impl. What obligation is generated as a result? We want to get
Baz: for<'a> Bar<&'a isize>
, but how does that happen?
After the matching, we are in a position where we have a placeholder
substitution like X => &'0 isize
. If we apply this substitution to the
impl obligations, we get F : Bar<&'0 isize>
. Obviously this is not
directly usable because the placeholder region '0
cannot leak out of
our computation.
What we do is to create an inverse mapping from the taint set of '0
back to the original bound region ('a
, here) that '0
resulted
from. (This is done in higher_ranked::plug_leaks
). We know that the
leak check passed, so this taint set consists solely of the placeholder
region itself plus various intermediate region variables. We then walk
the trait-reference and convert every region in that taint set back to
a late-bound region, so in this case we'd wind up with
Baz: for<'a> Bar<&'a isize>
.
Caching and subtle considerations therewith
In general, we attempt to cache the results of trait selection. This is a somewhat complex process. Part of the reason for this is that we want to be able to cache results even when all the types in the trait reference are not fully known. In that case, it may happen that the trait selection process is also influencing type variables, so we have to be able to not only cache the result of the selection process, but replay its effects on the type variables.
An example
The high-level idea of how the cache works is that we first replace
all unbound inference variables with placeholder versions. Therefore,
if we had a trait reference usize : Foo<$t>
, where $t
is an unbound
inference variable, we might replace it with usize : Foo<$0>
, where
$0
is a placeholder type. We would then look this up in the cache.
If we found a hit, the hit would tell us the immediate next step to
take in the selection process (e.g. apply impl #22, or apply where
clause X : Foo<Y>
).
On the other hand, if there is no hit, we need to go through the selection process from scratch. Suppose, we come to the conclusion that the only possible impl is this one, with def-id 22:
impl Foo<isize> for usize { ... } // Impl #22
We would then record in the cache usize : Foo<$0> => ImplCandidate(22)
. Next
we would confirm ImplCandidate(22)
, which would (as a side-effect) unify
$t
with isize
.
Now, at some later time, we might come along and see a usize : Foo<$u>
. When replaced with a placeholder, this would yield usize : Foo<$0>
, just as
before, and hence the cache lookup would succeed, yielding
ImplCandidate(22)
. We would confirm ImplCandidate(22)
which would
(as a side-effect) unify $u
with isize
.
Where clauses and the local vs global cache
One subtle interaction is that the results of trait lookup will vary
depending on what where clauses are in scope. Therefore, we actually
have two caches, a local and a global cache. The local cache is
attached to the ParamEnv
, and the global cache attached to the
tcx
. We use the local cache whenever the result might depend on the
where clauses that are in scope. The determination of which cache to
use is done by the method pick_candidate_cache
in select.rs
. At
the moment, we use a very simple, conservative rule: if there are any
where-clauses in scope, then we use the local cache. We used to try
and draw finer-grained distinctions, but that led to a serious of
annoying and weird bugs like #22019 and #18290. This simple rule seems
to be pretty clearly safe and also still retains a very high hit rate
(~95% when compiling rustc).
TODO: it looks like pick_candidate_cache
no longer exists. In
general, is this section still accurate at all?
Specialization
TODO: where does Chalk fit in? Should we mention/discuss it here?
Defined in the specialize
module.
The basic strategy is to build up a specialization graph during coherence checking (recall that coherence checking looks for overlapping impls). Insertion into the graph locates the right place to put an impl in the specialization hierarchy; if there is no right place (due to partial overlap but no containment), you get an overlap error. Specialization is consulted when selecting an impl (of course), and the graph is consulted when propagating defaults down the specialization hierarchy.
You might expect that the specialization graph would be used during selection – i.e. when actually performing specialization. This is not done for two reasons:
-
It's merely an optimization: given a set of candidates that apply, we can determine the most specialized one by comparing them directly for specialization, rather than consulting the graph. Given that we also cache the results of selection, the benefit of this optimization is questionable.
-
To build the specialization graph in the first place, we need to use selection (because we need to determine whether one impl specializes another). Dealing with this reentrancy would require some additional mode switch for selection. Given that there seems to be no strong reason to use the graph anyway, we stick with a simpler approach in selection, and use the graph only for propagating default implementations.
Trait impl selection can succeed even when multiple impls can apply,
as long as they are part of the same specialization family. In that
case, it returns a single impl on success – this is the most
specialized impl known to apply. However, if there are any inference
variables in play, the returned impl may not be the actual impl we
will use at trans time. Thus, we take special care to avoid projecting
associated types unless either (1) the associated type does not use
default
and thus cannot be overridden or (2) all input types are
known concretely.
Additional Resources
This talk by @sunjay may be useful. Keep in mind that the talk only gives a broad overview of the problem and the solution (it was presented about halfway through @sunjay's work). Also, it was given in June 2018, and some things may have changed by the time you watch it.
Trait solving (new-style)
🚧 This chapter describes "new-style" trait solving. This is still in the process of being implemented; this chapter serves as a kind of in-progress design document. If you would prefer to read about how the current trait solver works, check out this other chapter. 🚧
By the way, if you would like to help in hacking on the new solver, you will find instructions for getting involved in the Traits Working Group tracking issue!
The new-style trait solver is based on the work done in chalk. Chalk recasts Rust's trait system explicitly in terms of logic programming. It does this by "lowering" Rust code into a kind of logic program we can then execute queries against.
You can read more about chalk itself in the Overview of Chalk section.
Trait solving in rustc is based around a few key ideas:
- Lowering to logic, which expresses
Rust traits in terms of standard logical terms.
- The goals and clauses chapter describes the precise form of rules we use, and lowering rules gives the complete set of lowering rules in a more reference-like form.
- Lazy normalization, which is the technique we use to accommodate associated types when figuring out whether types are equal.
- Region constraints, which are accumulated during trait solving but mostly ignored. This means that trait solving effectively ignores the precise regions involved, always – but we still remember the constraints on them so that those constraints can be checked by the type checker.
- Canonical queries, which allow us
to solve trait problems (like "is
Foo
implemented for the typeBar
?") once, and then apply that same result independently in many different inference contexts.
This is not a complete list of topics. See the sidebar for more.
Ongoing work
The design of the new-style trait solving currently happens in two places:
chalk. The chalk repository is where we experiment with new ideas and designs for the trait system. It primarily consists of two parts:
- a unit testing framework for the correctness and feasibility of the logical rules defining the new-style trait system.
- the
chalk_engine
crate, which defines the new-style trait solver used both in the unit testing framework and in rustc.
rustc. Once we are happy with the logical rules, we proceed to
implementing them in rustc. This mainly happens in
librustc_traits
.
Lowering to logic
The key observation here is that the Rust trait system is basically a kind of logic, and it can be mapped onto standard logical inference rules. We can then look for solutions to those inference rules in a very similar fashion to how e.g. a Prolog solver works. It turns out that we can't quite use Prolog rules (also called Horn clauses) but rather need a somewhat more expressive variant.
Rust traits and logic
One of the first observations is that the Rust trait system is basically a kind of logic. As such, we can map our struct, trait, and impl declarations into logical inference rules. For the most part, these are basically Horn clauses, though we'll see that to capture the full richness of Rust – and in particular to support generic programming – we have to go a bit further than standard Horn clauses.
To see how this mapping works, let's start with an example. Imagine we declare a trait and a few impls, like so:
# #![allow(unused_variables)] #fn main() { trait Clone { } impl Clone for usize { } impl<T> Clone for Vec<T> where T: Clone { } #}
We could map these declarations to some Horn clauses, written in a Prolog-like notation, as follows:
Clone(usize).
Clone(Vec<?T>) :- Clone(?T).
// The notation `A :- B` means "A is true if B is true".
// Or, put another way, B implies A.
In Prolog terms, we might say that Clone(Foo)
– where Foo
is some
Rust type – is a predicate that represents the idea that the type
Foo
implements Clone
. These rules are program clauses; they
state the conditions under which that predicate can be proven (i.e.,
considered true). So the first rule just says "Clone is implemented
for usize
". The next rule says "for any type ?T
, Clone is
implemented for Vec<?T>
if clone is implemented for ?T
". So
e.g. if we wanted to prove that Clone(Vec<Vec<usize>>)
, we would do
so by applying the rules recursively:
Clone(Vec<Vec<usize>>)
is provable if:Clone(Vec<usize>)
is provable if:Clone(usize)
is provable. (Which it is, so we're all good.)
But now suppose we tried to prove that Clone(Vec<Bar>)
. This would
fail (after all, I didn't give an impl of Clone
for Bar
):
Clone(Vec<Bar>)
is provable if:Clone(Bar)
is provable. (But it is not, as there are no applicable rules.)
We can easily extend the example above to cover generic traits with
more than one input type. So imagine the Eq<T>
trait, which declares
that Self
is equatable with a value of type T
:
trait Eq<T> { ... }
impl Eq<usize> for usize { }
impl<T: Eq<U>> Eq<Vec<U>> for Vec<T> { }
That could be mapped as follows:
Eq(usize, usize).
Eq(Vec<?T>, Vec<?U>) :- Eq(?T, ?U).
So far so good.
Type-checking normal functions
OK, now that we have defined some logical rules that are able to express when traits are implemented and to handle associated types, let's turn our focus a bit towards type-checking. Type-checking is interesting because it is what gives us the goals that we need to prove. That is, everything we've seen so far has been about how we derive the rules by which we can prove goals from the traits and impls in the program; but we are also interested in how to derive the goals that we need to prove, and those come from type-checking.
Consider type-checking the function foo()
here:
fn foo() { bar::<usize>() }
fn bar<U: Eq<U>>() { }
This function is very simple, of course: all it does is to call
bar::<usize>()
. Now, looking at the definition of bar()
, we can see
that it has one where-clause U: Eq<U>
. So, that means that foo()
will
have to prove that usize: Eq<usize>
in order to show that it can call bar()
with usize
as the type argument.
If we wanted, we could write a Prolog predicate that defines the
conditions under which bar()
can be called. We'll say that those
conditions are called being "well-formed":
barWellFormed(?U) :- Eq(?U, ?U).
Then we can say that foo()
type-checks if the reference to
bar::<usize>
(that is, bar()
applied to the type usize
) is
well-formed:
fooTypeChecks :- barWellFormed(usize).
If we try to prove the goal fooTypeChecks
, it will succeed:
fooTypeChecks
is provable if:barWellFormed(usize)
, which is provable if:Eq(usize, usize)
, which is provable because of an impl.
Ok, so far so good. Let's move on to type-checking a more complex function.
Type-checking generic functions: beyond Horn clauses
In the last section, we used standard Prolog horn-clauses (augmented with Rust's
notion of type equality) to type-check some simple Rust functions. But that only
works when we are type-checking non-generic functions. If we want to type-check
a generic function, it turns out we need a stronger notion of goal than what Prolog
can provide. To see what I'm talking about, let's revamp our previous
example to make foo
generic:
fn foo<T: Eq<T>>() { bar::<T>() }
fn bar<U: Eq<U>>() { }
To type-check the body of foo
, we need to be able to hold the type
T
"abstract". That is, we need to check that the body of foo
is
type-safe for all types T
, not just for some specific type. We might express
this like so:
fooTypeChecks :-
// for all types T...
forall<T> {
// ...if we assume that Eq(T, T) is provable...
if (Eq(T, T)) {
// ...then we can prove that `barWellFormed(T)` holds.
barWellFormed(T)
}
}.
This notation I'm using here is the notation I've been using in my
prototype implementation; it's similar to standard mathematical
notation but a bit Rustified. Anyway, the problem is that standard
Horn clauses don't allow universal quantification (forall
) or
implication (if
) in goals (though many Prolog engines do support
them, as an extension). For this reason, we need to accept something
called "first-order hereditary harrop" (FOHH) clauses – this long
name basically means "standard Horn clauses with forall
and if
in
the body". But it's nice to know the proper name, because there is a
lot of work describing how to efficiently handle FOHH clauses; see for
example Gopalan Nadathur's excellent
"A Proof Procedure for the Logic of Hereditary Harrop Formulas"
in the bibliography.
It turns out that supporting FOHH is not really all that hard. And
once we are able to do that, we can easily describe the type-checking
rule for generic functions like foo
in our logic.
Source
This page is a lightly adapted version of a blog post by Nicholas Matsakis.
Goals and clauses
In logic programming terms, a goal is something that you must prove and a clause is something that you know is true. As described in the lowering to logic chapter, Rust's trait solver is based on an extension of hereditary harrop (HH) clauses, which extend traditional Prolog Horn clauses with a few new superpowers.
Goals and clauses meta structure
In Rust's solver, goals and clauses have the following forms (note that the two definitions reference one another):
Goal = DomainGoal // defined in the section below
| Goal && Goal
| Goal || Goal
| exists<K> { Goal } // existential quantification
| forall<K> { Goal } // universal quantification
| if (Clause) { Goal } // implication
| true // something that's trivially true
| ambiguous // something that's never provable
Clause = DomainGoal
| Clause :- Goal // if can prove Goal, then Clause is true
| Clause && Clause
| forall<K> { Clause }
K = <type> // a "kind"
| <lifetime>
The proof procedure for these sorts of goals is actually quite straightforward. Essentially, it's a form of depth-first search. The paper "A Proof Procedure for the Logic of Hereditary Harrop Formulas" gives the details.
In terms of code, these types are defined in
librustc/traits/mod.rs
in rustc, and in
chalk-ir/src/lib.rs
in chalk.
Domain goals
Domain goals are the atoms of the trait logic. As can be seen in the definitions given above, general goals basically consist in a combination of domain goals.
Moreover, flattenning a bit the definition of clauses given previously, one can see that clauses are always of the form:
forall<K1, ..., Kn> { DomainGoal :- Goal }
hence domain goals are in fact clauses' LHS. That is, at the most granular level, domain goals are what the trait solver will end up trying to prove.
To define the set of domain goals in our system, we need to first introduce a few simple formulations. A trait reference consists of the name of a trait along with a suitable set of inputs P0..Pn:
TraitRef = P0: TraitName<P1..Pn>
So, for example, u32: Display
is a trait reference, as is Vec<T>: IntoIterator
. Note that Rust surface syntax also permits some extra
things, like associated type bindings (Vec<T>: IntoIterator<Item = T>
), that are not part of a trait reference.
A projection consists of an associated item reference along with its inputs P0..Pm:
Projection = <P0 as TraitName<P1..Pn>>::AssocItem<Pn+1..Pm>
Given these, we can define a DomainGoal
as follows:
DomainGoal = Holds(WhereClause)
| FromEnv(TraitRef)
| FromEnv(Type)
| WellFormed(TraitRef)
| WellFormed(Type)
| Normalize(Projection -> Type)
WhereClause = Implemented(TraitRef)
| ProjectionEq(Projection = Type)
| Outlives(Type: Region)
| Outlives(Region: Region)
WhereClause
refers to a where
clause that a Rust user would actually be able
to write in a Rust program. This abstraction exists only as a convenience as we
sometimes want to only deal with domain goals that are effectively writable in
Rust.
Let's break down each one of these, one-by-one.
Implemented(TraitRef)
e.g. Implemented(i32: Copy)
True if the given trait is implemented for the given input types and lifetimes.
ProjectionEq(Projection = Type)
e.g. ProjectionEq<T as Iterator>::Item = u8
The given associated type Projection
is equal to Type
; this can be proved
with either normalization or using placeholder associated types. See
the section on associated types.
Normalize(Projection -> Type)
e.g. ProjectionEq<T as Iterator>::Item -> u8
The given associated type Projection
can be normalized to Type
.
As discussed in the section on associated
types, Normalize
implies ProjectionEq
,
but not vice versa. In general, proving Normalize(<T as Trait>::Item -> U)
also requires proving Implemented(T: Trait)
.
FromEnv(TraitRef)
e.g. FromEnv(Self: Add<i32>)
True if the inner TraitRef
is assumed to be true,
that is, if it can be derived from the in-scope where clauses.
For example, given the following function:
# #![allow(unused_variables)] #fn main() { fn loud_clone<T: Clone>(stuff: &T) -> T { println!("cloning!"); stuff.clone() } #}
Inside the body of our function, we would have FromEnv(T: Clone)
. In-scope
where clauses nest, so a function body inside an impl body inherits the
impl body's where clauses, too.
This and the next rule are used to implement implied bounds. As we'll see
in the section on lowering, FromEnv(TraitRef)
implies Implemented(TraitRef)
,
but not vice versa. This distinction is crucial to implied bounds.
FromEnv(Type)
e.g. FromEnv(HashSet<K>)
True if the inner Type
is assumed to be well-formed, that is, if it is an
input type of a function or an impl.
For example, given the following code:
struct HashSet<K> where K: Hash { ... }
fn loud_insert<K>(set: &mut HashSet<K>, item: K) {
println!("inserting!");
set.insert(item);
}
HashSet<K>
is an input type of the loud_insert
function. Hence, we assume it
to be well-formed, so we would have FromEnv(HashSet<K>)
inside the body of our
function. As we'll see in the section on lowering, FromEnv(HashSet<K>)
implies
Implemented(K: Hash)
because the
HashSet
declaration was written with a K: Hash
where clause. Hence, we don't
need to repeat that bound on the loud_insert
function: we rather automatically
assume that it is true.
WellFormed(Item)
These goals imply that the given item is well-formed.
We can talk about different types of items being well-formed:
-
Types, like
WellFormed(Vec<i32>)
, which is true in Rust, orWellFormed(Vec<str>)
, which is not (becausestr
is notSized
.) -
TraitRefs, like
WellFormed(Vec<i32>: Clone)
.
Well-formedness is important to implied bounds. In particular, the reason
it is okay to assume FromEnv(T: Clone)
in the loud_clone
example is that we
also verify WellFormed(T: Clone)
for each call site of loud_clone
.
Similarly, it is okay to assume FromEnv(HashSet<K>)
in the loud_insert
example because we will verify WellFormed(HashSet<K>)
for each call site of
loud_insert
.
Outlives(Type: Region), Outlives(Region: Region)
e.g. Outlives(&'a str: 'b)
, Outlives('a: 'static)
True if the given type or region on the left outlives the right-hand region.
Coinductive goals
Most goals in our system are "inductive". In an inductive goal, circular reasoning is disallowed. Consider this example clause:
Implemented(Foo: Bar) :-
Implemented(Foo: Bar).
Considered inductively, this clause is useless: if we are trying to
prove Implemented(Foo: Bar)
, we would then recursively have to prove
Implemented(Foo: Bar)
, and that cycle would continue ad infinitum
(the trait solver will terminate here, it would just consider that
Implemented(Foo: Bar)
is not known to be true).
However, some goals are co-inductive. Simply put, this means that
cycles are OK. So, if Bar
were a co-inductive trait, then the rule
above would be perfectly valid, and it would indicate that
Implemented(Foo: Bar)
is true.
Auto traits are one example in Rust where co-inductive goals are used.
Consider the Send
trait, and imagine that we have this struct:
# #![allow(unused_variables)] #fn main() { struct Foo { next: Option<Box<Foo>> } #}
The default rules for auto traits say that Foo
is Send
if the
types of its fields are Send
. Therefore, we would have a rule like
Implemented(Foo: Send) :-
Implemented(Option<Box<Foo>>: Send).
As you can probably imagine, proving that Option<Box<Foo>>: Send
is
going to wind up circularly requiring us to prove that Foo: Send
again. So this would be an example where we wind up in a cycle – but
that's ok, we do consider Foo: Send
to hold, even though it
references itself.
In general, co-inductive traits are used in Rust trait solving when we
want to enumerate a fixed set of possibilities. In the case of auto
traits, we are enumerating the set of reachable types from a given
starting point (i.e., Foo
can reach values of type
Option<Box<Foo>>
, which implies it can reach values of type
Box<Foo>
, and then of type Foo
, and then the cycle is complete).
In addition to auto traits, WellFormed
predicates are co-inductive.
These are used to achieve a similar "enumerate all the cases" pattern,
as described in the section on implied bounds.
Incomplete chapter
Some topics yet to be written:
- Elaborate on the proof procedure
- SLG solving – introduce negative reasoning
Equality and associated types
This section covers how the trait system handles equality between associated types. The full system consists of several moving parts, which we will introduce one by one:
- Projection and the
Normalize
predicate - Placeholder associated type projections
- The
ProjectionEq
predicate - Integration with unification
Associated type projection and normalization
When a trait defines an associated type (e.g.,
the Item
type in the IntoIterator
trait), that
type can be referenced by the user using an associated type
projection like <Option<u32> as IntoIterator>::Item
.
Often, people will use the shorthand syntax
T::Item
. Presently, that syntax is expanded during "type collection" into the explicit form, though that is something we may want to change in the future.
In some cases, associated type projections can be normalized –
that is, simplified – based on the types given in an impl. So, to
continue with our example, the impl of IntoIterator
for Option<T>
declares (among other things) that Item = T
:
impl<T> IntoIterator for Option<T> {
type Item = T;
...
}
This means we can normalize the projection <Option<u32> as IntoIterator>::Item
to just u32
.
In this case, the projection was a "monomorphic" one – that is, it did not have any type parameters. Monomorphic projections are special because they can always be fully normalized.
Often, we can normalize other associated type projections as well. For
example, <Option<?T> as IntoIterator>::Item
, where ?T
is an inference
variable, can be normalized to just ?T
.
In our logic, normalization is defined by a predicate
Normalize
. The Normalize
clauses arise only from
impls. For example, the impl
of IntoIterator
for Option<T>
that
we saw above would be lowered to a program clause like so:
forall<T> {
Normalize(<Option<T> as IntoIterator>::Item -> T) :-
Implemented(Option<T>: IntoIterator)
}
where in this case, the one Implemented
condition is always true.
Since we do not permit quantification over traits, this is really more like a family of program clauses, one for each associated type.
We could apply that rule to normalize either of the examples that we've seen so far.
Placeholder associated types
Sometimes however we want to work with associated types that cannot be normalized. For example, consider this function:
fn foo<T: IntoIterator>(...) { ... }
In this context, how would we normalize the type T::Item
?
Without knowing what T
is, we can't really do so. To represent this case,
we introduce a type called a placeholder associated type projection. This
is written like so: (IntoIterator::Item)<T>
.
You may note that it looks a lot like a regular type (e.g., Option<T>
),
except that the "name" of the type is (IntoIterator::Item)
. This is not an
accident: placeholder associated type projections work just like ordinary
types like Vec<T>
when it comes to unification. That is, they are only
considered equal if (a) they are both references to the same associated type,
like IntoIterator::Item
and (b) their type arguments are equal.
Placeholder associated types are never written directly by the user. They are used internally by the trait system only, as we will see shortly.
In rustc, they correspond to the TyKind::UnnormalizedProjectionTy
enum
variant, declared in librustc/ty/sty.rs
. In chalk, we use an
ApplicationTy
with a name living in a special namespace dedicated to
placeholder associated types (see the TypeName
enum declared in
chalk-ir/src/lib.rs
).
Projection equality
So far we have seen two ways to answer the question of "When can we consider an associated type projection equal to another type?":
- the
Normalize
predicate could be used to transform projections when we knew which impl applied; - placeholder associated types can be used when we don't. This is also known as lazy normalization.
We now introduce the ProjectionEq
predicate to bring those two cases
together. The ProjectionEq
predicate looks like so:
ProjectionEq(<T as IntoIterator>::Item = U)
and we will see that it can be proven either via normalization or
via the placeholder type. As part of lowering an associated type declaration from
some trait, we create two program clauses for ProjectionEq
:
forall<T, U> {
ProjectionEq(<T as IntoIterator>::Item = U) :-
Normalize(<T as IntoIterator>::Item -> U)
}
forall<T> {
ProjectionEq(<T as IntoIterator>::Item = (IntoIterator::Item)<T>)
}
These are the only two ProjectionEq
program clauses we ever make for
any given associated item.
Integration with unification
Now we are ready to discuss how associated type equality integrates with unification. As described in the type inference section, unification is basically a procedure with a signature like this:
Unify(A, B) = Result<(Subgoals, RegionConstraints), NoSolution>
In other words, we try to unify two things A and B. That procedure
might just fail, in which case we get back Err(NoSolution)
. This
would happen, for example, if we tried to unify u32
and i32
.
The key point is that, on success, unification can also give back to us a set of subgoals that still remain to be proven. (It can also give back region constraints, but those are not relevant here).
Whenever unification encounters a non-placeholder associated type
projection P being equated with some other type T, it always succeeds,
but it produces a subgoal ProjectionEq(P = T)
that is propagated
back up. Thus it falls to the ordinary workings of the trait system
to process that constraint.
If we unify two projections P1 and P2, then unification produces a variable X and asks us to prove that
ProjectionEq(P1 = X)
andProjectionEq(P2 = X)
. (That used to be needed in an older system to prevent cycles; I rather doubt it still is. -nmatsakis)
Implied Bounds
Implied bounds remove the need to repeat where clauses written on a type declaration or a trait declaration. For example, say we have the following type declaration:
struct HashSet<K: Hash> {
...
}
then everywhere we use HashSet<K>
as an "input" type, that is appearing in
the receiver type of an impl
or in the arguments of a function, we don't
want to have to repeat the where K: Hash
bound, as in:
// I don't want to have to repeat `where K: Hash` here.
impl<K> HashSet<K> {
...
}
// Same here.
fn loud_insert<K>(set: &mut HashSet<K>, item: K) {
println!("inserting!");
set.insert(item);
}
Note that in the loud_insert
example, HashSet<K>
is not the type
of the set
argument of loud_insert
, it only appears in the
argument type &mut HashSet<K>
: we care about every type appearing
in the function's header (the header is the signature without the return type),
not only types of the function's arguments.
The rationale for applying implied bounds to input types is that, for example,
in order to call the loud_insert
function above, the programmer must have
produced the type HashSet<K>
already, hence the compiler already verified
that HashSet<K>
was well-formed, i.e. that K
effectively implemented
Hash
, as in the following example:
fn main() {
// I am producing a value of type `HashSet<i32>`.
// If `i32` was not `Hash`, the compiler would report an error here.
let set: HashSet<i32> = HashSet::new();
loud_insert(&mut set, 5);
}
Hence, we don't want to repeat where clauses for input types because that would
sort of duplicate the work of the programmer, having to verify that their types
are well-formed both when calling the function and when using them in the
arguments of their function. The same reasoning applies when using an impl
.
Similarly, given the following trait declaration:
trait Copy where Self: Clone { // desugared version of `Copy: Clone`
...
}
then everywhere we bound over SomeType: Copy
, we would like to be able to
use the fact that SomeType: Clone
without having to write it explicitly,
as in:
fn loud_clone<T: Clone>(x: T) {
println!("cloning!");
x.clone();
}
fn fun_with_copy<T: Copy>(x: T) {
println!("will clone a `Copy` type soon...");
// I'm using `loud_clone<T: Clone>` with `T: Copy`, I know this
// implies `T: Clone` so I don't want to have to write it explicitly.
loud_clone(x);
}
The rationale for implied bounds for traits is that if a type implements
Copy
, that is, if there exists an impl Copy
for that type, there ought
to exist an impl Clone
for that type, otherwise the compiler would have
reported an error in the first place. So again, if we were forced to repeat the
additionnal where SomeType: Clone
everywhere whereas we already know that
SomeType: Copy
hold, we would kind of duplicate the verification work.
Implied bounds are not yet completely enforced in rustc, at the moment it only works for outlive requirements, super trait bounds, and bounds on associated types. The full RFC can be found here. We'll give here a brief view of how implied bounds work and why we chose to implement it that way. The complete set of lowering rules can be found in the corresponding chapter.
Implied bounds and lowering rules
Now we need to express implied bounds in terms of logical rules. We will start with exposing a naive way to do it. Suppose that we have the following traits:
trait Foo {
...
}
trait Bar where Self: Foo { } {
...
}
So we would like to say that if a type implements Bar
, then necessarily
it must also implement Foo
. We might think that a clause like this would
work:
forall<Type> {
Implemented(Type: Foo) :- Implemented(Type: Bar).
}
Now suppose that we just write this impl:
struct X;
impl Bar for X { }
Clearly this should not be allowed: indeed, we wrote a Bar
impl for X
, but
the Bar
trait requires that we also implement Foo
for X
, which we never
did. In terms of what the compiler does, this would look like this:
struct X;
impl Bar for X {
// We are in a `Bar` impl for the type `X`.
// There is a `where Self: Foo` bound on the `Bar` trait declaration.
// Hence I need to prove that `X` also implements `Foo` for that impl
// to be legal.
}
So the compiler would try to prove Implemented(X: Foo)
. Of course it will
not find any impl Foo for X
since we did not write any. However, it
will see our implied bound clause:
forall<Type> {
Implemented(Type: Foo) :- Implemented(Type: Bar).
}
so that it may be able to prove Implemented(X: Foo)
if Implemented(X: Bar)
holds. And it turns out that Implemented(X: Bar)
does hold since we wrote
a Bar
impl for X
! Hence the compiler will accept the Bar
impl while it
should not.
Implied bounds coming from the environment
So the naive approach does not work. What we need to do is to somehow decouple
implied bounds from impls. Suppose we know that a type SomeType<...>
implements Bar
and we want to deduce that SomeType<...>
must also implement
Foo
.
There are two possibilities: first, we have enough information about
SomeType<...>
to see that there exists a Bar
impl in the program which
covers SomeType<...>
, for example a plain impl<...> Bar for SomeType<...>
.
Then if the compiler has done its job correctly, there must exist a Foo
impl which covers SomeType<...>
, e.g. another plain
impl<...> Foo for SomeType<...>
. In that case then, we can just use this
impl and we do not need implied bounds at all.
Second possibility: we do not know enough about SomeType<...>
in order to
find a Bar
impl which covers it, for example if SomeType<...>
is just
a type parameter in a function:
fn foo<T: Bar>() {
// We'd like to deduce `Implemented(T: Foo)`.
}
That is, the information that T
implements Bar
here comes from the
environment. The environment is the set of things that we assume to be true
when we type check some Rust declaration. In that case, what we assume is that
T: Bar
. Then at that point, we might authorize ourselves to have some kind
of "local" implied bound reasoning which would say
Implemented(T: Foo) :- Implemented(T: Bar)
. This reasoning would
only be done within our foo
function in order to avoid the earlier
problem where we had a global clause.
We can apply these local reasonings everywhere we can have an environment -- i.e. when we can write where clauses -- that is, inside impls, trait declarations, and type declarations.
Computing implied bounds with FromEnv
The previous subsection showed that it was only useful to compute implied bounds for facts coming from the environment. We talked about "local" rules, but there are multiple possible strategies to indeed implement the locality of implied bounds.
In rustc, the current strategy is to elaborate bounds: that is, each time we have a fact in the environment, we recursively derive all the other things that are implied by this fact until we reach a fixed point. For example, if we have the following declarations:
trait A { }
trait B where Self: A { }
trait C where Self: B { }
fn foo<T: C>() {
...
}
then inside the foo
function, we start with an environment containing only
Implemented(T: C)
. Then because of implied bounds for the C
trait, we
elaborate Implemented(T: B)
and add it to our environment. Because of
implied bounds for the B
trait, we elaborate Implemented(T: A)
and add it
to our environment as well. We cannot elaborate anything else, so we conclude
that our final environment consists of Implemented(T: A + B + C)
.
In the new-style trait system, we like to encode as many things as possible with logical rules. So rather than "elaborating", we have a set of global program clauses defined like so:
forall<T> { Implemented(T: A) :- FromEnv(T: A). }
forall<T> { Implemented(T: B) :- FromEnv(T: B). }
forall<T> { FromEnv(T: A) :- FromEnv(T: B). }
forall<T> { Implemented(T: C) :- FromEnv(T: C). }
forall<T> { FromEnv(T: C) :- FromEnv(T: C). }
So these clauses are defined globally (that is, they are available from
everywhere in the program) but they cannot be used because the hypothesis
is always of the form FromEnv(...)
which is a bit special. Indeed, as
indicated by the name, FromEnv(...)
facts can only come from the
environment.
How it works is that in the foo
function, instead of having an environment
containing Implemented(T: C)
, we replace this environment with
FromEnv(T: C)
. From here and thanks to the above clauses, we see that we
are able to reach any of Implemented(T: A)
, Implemented(T: B)
or
Implemented(T: C)
, which is what we wanted.
Implied bounds and well-formedness checking
Implied bounds are tightly related with well-formedness checking. Well-formedness checking is the process of checking that the impls the programmer wrote are legal, what we referred to earlier as "the compiler doing its job correctly".
We already saw examples of illegal and legal impls:
trait Foo { }
trait Bar where Self: Foo { }
struct X;
struct Y;
impl Bar for X {
// This impl is not legal: the `Bar` trait requires that we also
// implement `Foo`, and we didn't.
}
impl Foo for Y {
// This impl is legal: there is nothing to check as there are no where
// clauses on the `Foo` trait.
}
impl Bar for Y {
// This impl is legal: we have a `Foo` impl for `Y`.
}
We must define what "legal" and "illegal" mean. For this, we introduce another
predicate: WellFormed(Type: Trait)
. We say that the trait reference
Type: Trait
is well-formed if Type
meets the bounds written on the
Trait
declaration. For each impl we write, assuming that the where clauses
declared on the impl hold, the compiler tries to prove that the corresponding
trait reference is well-formed. The impl is legal if the compiler manages to do
so.
Coming to the definition of WellFormed(Type: Trait)
, it would be tempting
to define it as:
trait Trait where WC1, WC2, ..., WCn {
...
}
forall<Type> {
WellFormed(Type: Trait) :- WC1 && WC2 && .. && WCn.
}
and indeed this was basically what was done in rustc until it was noticed that
this mixed badly with implied bounds. The key thing is that implied bounds
allows someone to derive all bounds implied by a fact in the environment, and
this transitively as we've seen with the A + B + C
traits example.
However, the WellFormed
predicate as defined above only checks that the
direct superbounds hold. That is, if we come back to our A + B + C
example:
trait A { }
// No where clauses, always well-formed.
// forall<Type> { WellFormed(Type: A). }
trait B where Self: A { }
// We only check the direct superbound `Self: A`.
// forall<Type> { WellFormed(Type: B) :- Implemented(Type: A). }
trait C where Self: B { }
// We only check the direct superbound `Self: B`. We do not check
// the `Self: A` implied bound coming from the `Self: B` superbound.
// forall<Type> { WellFormed(Type: C) :- Implemented(Type: B). }
There is an asymmetry between the recursive power of implied bounds and
the shallow checking of WellFormed
. It turns out that this asymmetry
can be exploited. Indeed, suppose that we define the following
traits:
trait Partial where Self: Copy { }
// WellFormed(Self: Partial) :- Implemented(Self: Copy).
trait Complete where Self: Partial { }
// WellFormed(Self: Complete) :- Implemented(Self: Partial).
impl<T> Partial for T where T: Complete { }
impl<T> Complete for T { }
For the Partial
impl, what the compiler must prove is:
forall<T> {
if (T: Complete) { // assume that the where clauses hold
WellFormed(T: Partial) // show that the trait reference is well-formed
}
}
Proving WellFormed(T: Partial)
amounts to proving Implemented(T: Copy)
.
However, we have Implemented(T: Complete)
in our environment: thanks to
implied bounds, we can deduce Implemented(T: Partial)
. Using implied bounds
one level deeper, we can deduce Implemented(T: Copy)
. Finally, the Partial
impl is legal.
For the Complete
impl, what the compiler must prove is:
forall<T> {
WellFormed(T: Complete) // show that the trait reference is well-formed
}
Proving WellFormed(T: Complete)
amounts to proving Implemented(T: Partial)
.
We see that the impl Partial for T
applies if we can prove
Implemented(T: Complete)
, and it turns out we can prove this fact since our
impl<T> Complete for T
is a blanket impl without any where clauses.
So both impls are legal and the compiler accepts the program. Moreover, thanks
to the Complete
blanket impl, all types implement Complete
. So we could
now use this impl like so:
fn eat<T>(x: T) { }
fn copy_everything<T: Complete>(x: T) {
eat(x);
eat(x);
}
fn main() {
let not_copiable = vec![1, 2, 3, 4];
copy_everything(not_copiable);
}
In this program, we use the fact that Vec<i32>
implements Complete
, as any
other type. Hence we can call copy_everything
with an argument of type
Vec<i32>
. Inside the copy_everything
function, we have the
Implemented(T: Complete)
bound in our environment. Thanks to implied bounds,
we can deduce Implemented(T: Partial)
. Using implied bounds again, we deduce
Implemented(T: Copy)
and we can indeed call the eat
function which moves
the argument twice since its argument is Copy
. Problem: the T
type was
in fact Vec<i32>
which is not copy at all, hence we will double-free the
underlying vec storage so we have a memory unsoundness in safe Rust.
Of course, disregarding the asymmetry between WellFormed
and implied bounds,
this bug was possible only because we had some kind of self-referencing impls.
But self-referencing impls are very useful in practice and are not the real
culprits in this affair.
Co-inductiveness of WellFormed
So the solution is to fix this asymmetry between WellFormed
and implied
bounds. For that, we need for the WellFormed
predicate to not only require
that the direct superbounds hold, but also all the bounds transitively implied
by the superbounds. What we can do is to have the following rules for the
WellFormed
predicate:
trait A { }
// WellFormed(Self: A) :- Implemented(Self: A).
trait B where Self: A { }
// WellFormed(Self: B) :- Implemented(Self: B) && WellFormed(Self: A).
trait C where Self: B { }
// WellFormed(Self: C) :- Implemented(Self: C) && WellFormed(Self: B).
Notice that we are now also requiring Implemented(Self: Trait)
for
WellFormed(Self: Trait)
to be true: this is to simplify the process of
traversing all the implied bounds transitively. This does not change anything
when checking whether impls are legal, because since we assume
that the where clauses hold inside the impl, we know that the corresponding
trait reference do hold. Thanks to this setup, you can see that we indeed
require to prove the set of all bounds transitively implied by the where
clauses.
However there is still a catch. Suppose that we have the following trait definition:
trait Foo where <Self as Foo>::Item: Foo {
type Item;
}
so this definition is a bit more involved than the ones we've seen already because it defines an associated item. However, the well-formedness rule would not be more complicated:
WellFormed(Self: Foo) :-
Implemented(Self: Foo) &&
WellFormed(<Self as Foo>::Item: Foo).
Now we would like to write the following impl:
impl Foo for i32 {
type Item = i32;
}
The Foo
trait definition and the impl Foo for i32
are perfectly valid
Rust: we're kind of recursively using our Foo
impl in order to show that
the associated value indeed implements Foo
, but that's ok. But if we
translate this to our well-formedness setting, the compiler proof process
inside the Foo
impl is the following: it starts with proving that the
well-formedness goal WellFormed(i32: Foo)
is true. In order to do that,
it must prove the following goals: Implemented(i32: Foo)
and
WellFormed(<i32 as Foo>::Item: Foo)
. Implemented(i32: Foo)
holds because
there is our impl and there are no where clauses on it so it's always true.
However, because of the associated type value we used,
WellFormed(<i32 as Foo>::Item: Foo)
simplifies to just
WellFormed(i32: Foo)
. So in order to prove its original goal
WellFormed(i32: Foo)
, the compiler needs to prove WellFormed(i32: Foo)
:
this clearly is a cycle and cycles are usually rejected by the trait solver,
unless... if the WellFormed
predicate was made to be co-inductive.
A co-inductive predicate, as discussed in the chapter on
goals and clauses, are predicates
for which the
trait solver accepts cycles. In our setting, this would be a valid thing to do:
indeed, the WellFormed
predicate just serves as a way of enumerating all
the implied bounds. Hence, it's like a fixed point algorithm: it tries to grow
the set of implied bounds until there is nothing more to add. Here, a cycle
in the chain of WellFormed
predicates just means that there is no more bounds
to add in that direction, so we can just accept this cycle and focus on other
directions. It's easy to prove that under these co-inductive semantics, we
are effectively visiting all the transitive implied bounds, and only these.
Implied bounds on types
We mainly talked about implied bounds for traits because this was the most subtle regarding implementation. Implied bounds on types are simpler, especially because if we assume that a type is well-formed, we don't use that fact to deduce that other types are well-formed, we only use it to deduce that e.g. some trait bounds hold.
For types, we just use rules like these ones:
struct Type<...> where WC1, ..., WCn {
...
}
forall<...> {
WellFormed(Type<...>) :- WC1, ..., WCn.
}
forall<...> {
FromEnv(WC1) :- FromEnv(Type<...>).
...
FromEnv(WCn) :- FromEnv(Type<...>).
}
We can see that we have this asymmetry between well-formedness check,
which only verifies that the direct superbounds hold, and implied bounds which
gives access to all bounds transitively implied by the where clauses. In that
case this is ok because as we said, we don't use FromEnv(Type<...>)
to deduce
other FromEnv(OtherType<...>)
things, nor do we use FromEnv(Type: Trait)
to
deduce FromEnv(OtherType<...>)
things. So in that sense type definitions are
"less recursive" than traits, and we saw in a previous subsection that
it was the combination of asymmetry and recursive trait / impls that led to
unsoundness. As such, the WellFormed(Type<...>)
predicate does not need
to be co-inductive.
This asymmetry optimization is useful because in a real Rust program, we have to check the well-formedness of types very often (e.g. for each type which appears in the body of a function).
Region constraints
To be written.
Chalk does not have the concept of region constraints, and as of this writing, work on rustc was not far enough to worry about them.
In the meantime, you can read about region constraints in the type inference section.
The lowering module in rustc
The program clauses described in the
lowering rules section are actually
created in the rustc_traits::lowering
module.
The program_clauses_for
query
The main entry point is the program_clauses_for
query, which –
given a def-id – produces a set of Chalk program clauses. These
queries are tested using a
dedicated unit-testing mechanism, described below. The
query is invoked on a DefId
that identifies something like a trait,
an impl, or an associated item definition. It then produces and
returns a vector of program clauses.
Unit tests
Unit tests are located in src/test/ui/chalkify
. A good
example test is the lower_impl
test. At the time of
this writing, it looked like this:
#![feature(rustc_attrs)]
trait Foo { }
#[rustc_dump_program_clauses] //~ ERROR program clause dump
impl<T: 'static> Foo for T where T: Iterator<Item = i32> { }
fn main() {
println!("hello");
}
The #[rustc_dump_program_clauses]
annotation can be attached to
anything with a def-id. (It requires the rustc_attrs
feature.) The
compiler will then invoke the program_clauses_for
query on that
item, and emit compiler errors that dump the clauses produced. These
errors just exist for unit-testing, as we can then leverage the
standard ui test mechanisms to check them. In this case, there is a
//~ ERROR program clause dump
annotation which is always the same for
#[rustc_dump_program_clauses]
, but the stderr file contains
the full details:
error: program clause dump
--> $DIR/lower_impl.rs:5:1
|
LL | #[rustc_dump_program_clauses]
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: forall<T> { Implemented(T: Foo) :- ProjectionEq(<T as std::iter::Iterator>::Item == i32), TypeOutlives(T: 'static), Implemented(T: std::iter::Iterator), Implemented(T: std::marker::Sized). }
Lowering rules
This section gives the complete lowering rules for Rust traits into program clauses. It is a kind of reference. These rules reference the domain goals defined in an earlier section.
Notation
The nonterminal Pi
is used to mean some generic parameter, either a
named lifetime like 'a
or a type parameter like A
.
The nonterminal Ai
is used to mean some generic argument, which
might be a lifetime like 'a
or a type like Vec<A>
.
When defining the lowering rules, we will give goals and clauses in
the notation given in this section.
We sometimes insert "macros" like LowerWhereClause!
into these
definitions; these macros reference other sections within this chapter.
Rule names and cross-references
Each of these lowering rules is given a name, documented with a comment like so:
// Rule Foo-Bar-Baz
The reference implementation of these rules is to be found in
chalk/chalk-solve/src/clauses.rs
. They are also ported in
rustc in the librustc_traits
crate.
Lowering where clauses
When used in a goal position, where clauses can be mapped directly to
the Holds
variant of domain goals, as follows:
A0: Foo<A1..An>
maps toImplemented(A0: Foo<A1..An>)
T: 'r
maps toOutlives(T, 'r)
'a: 'b
maps toOutlives('a, 'b)
A0: Foo<A1..An, Item = T>
is a bit special and expands to two distinct goals, namelyImplemented(A0: Foo<A1..An>)
andProjectionEq(<A0 as Foo<A1..An>>::Item = T)
In the rules below, we will use WC
to indicate where clauses that
appear in Rust syntax; we will then use the same WC
to indicate
where those where clauses appear as goals in the program clauses that
we are producing. In that case, the mapping above is used to convert
from the Rust syntax into goals.
Transforming the lowered where clauses
In addition, in the rules below, we sometimes do some transformations on the lowered where clauses, as defined here:
FromEnv(WC)
– this indicates that:Implemented(TraitRef)
becomesFromEnv(TraitRef)
- other where-clauses are left intact
WellFormed(WC)
– this indicates that:Implemented(TraitRef)
becomesWellFormed(TraitRef)
- other where-clauses are left intact
TODO: I suspect that we want to alter the outlives relations too, but Chalk isn't modeling those right now.
Lowering traits
Given a trait definition
trait Trait<P1..Pn> // P0 == Self
where WC
{
// trait items
}
we will produce a number of declarations. This section is focused on
the program clauses for the trait header (i.e., the stuff outside the
{}
); the section on trait items covers the stuff
inside the {}
.
Trait header
From the trait itself we mostly make "meta" rules that setup the
relationships between different kinds of domain goals. The first such
rule from the trait header creates the mapping between the FromEnv
and Implemented
predicates:
// Rule Implemented-From-Env
forall<Self, P1..Pn> {
Implemented(Self: Trait<P1..Pn>) :- FromEnv(Self: Trait<P1..Pn>)
}
Implied bounds
The next few clauses have to do with implied bounds (see also RFC 2089 and the implied bounds chapter for a more in depth cover). For each trait, we produce two clauses:
// Rule Implied-Bound-From-Trait
//
// For each where clause WC:
forall<Self, P1..Pn> {
FromEnv(WC) :- FromEnv(Self: Trait<P1..Pn)
}
This clause says that if we are assuming that the trait holds, then we can also assume that its where-clauses hold. It's perhaps useful to see an example:
trait Eq: PartialEq { ... }
In this case, the PartialEq
supertrait is equivalent to a where Self: PartialEq
where clause, in our simplified model. The program
clause above therefore states that if we can prove FromEnv(T: Eq)
–
e.g., if we are in some function with T: Eq
in its where clauses –
then we also know that FromEnv(T: PartialEq)
. Thus the set of things
that follow from the environment are not only the direct where
clauses but also things that follow from them.
The next rule is related; it defines what it means for a trait reference to be well-formed:
// Rule WellFormed-TraitRef
forall<Self, P1..Pn> {
WellFormed(Self: Trait<P1..Pn>) :- Implemented(Self: Trait<P1..Pn>) && WellFormed(WC)
}
This WellFormed
rule states that T: Trait
is well-formed if (a)
T: Trait
is implemented and (b) all the where-clauses declared on
Trait
are well-formed (and hence they are implemented). Remember
that the WellFormed
predicate is
coinductive; in this
case, it is serving as a kind of "carrier" that allows us to enumerate
all the where clauses that are transitively implied by T: Trait
.
An example:
trait Foo: A + Bar { }
trait Bar: B + Foo { }
trait A { }
trait B { }
Here, the transitive set of implications for T: Foo
are T: A
, T: Bar
, and
T: B
. And indeed if we were to try to prove WellFormed(T: Foo)
, we would
have to prove each one of those:
WellFormed(T: Foo)
Implemented(T: Foo)
WellFormed(T: A)
Implemented(T: A)
WellFormed(T: Bar)
Implemented(T: Bar)
WellFormed(T: B)
Implemented(T: Bar)
WellFormed(T: Foo)
-- cycle, true coinductively
This WellFormed
predicate is only used when proving that impls are
well-formed – basically, for each impl of some trait ref TraitRef
,
we must show that WellFormed(TraitRef)
. This in turn justifies the
implied bounds rules that allow us to extend the set of FromEnv
items.
Lowering type definitions
We also want to have some rules which define when a type is well-formed. For example, given this type:
struct Set<K> where K: Hash { ... }
then Set<i32>
is well-formed because i32
implements Hash
, but
Set<NotHash>
would not be well-formed. Basically, a type is well-formed
if its parameters verify the where clauses written on the type definition.
Hence, for every type definition:
struct Type<P1..Pn> where WC { ... }
we produce the following rule:
// Rule WellFormed-Type
forall<P1..Pn> {
WellFormed(Type<P1..Pn>) :- WC
}
Note that we use struct
for defining a type, but this should be understood
as a general type definition (it could be e.g. a generic enum
).
Conversely, we define rules which say that if we assume that a type is well-formed, we can also assume that its where clauses hold. That is, we produce the following family of rules:
// Rule Implied-Bound-From-Type
//
// For each where clause `WC`
forall<P1..Pn> {
FromEnv(WC) :- FromEnv(Type<P1..Pn>)
}
As for the implied bounds RFC, functions will assume that their arguments are well-formed. For example, suppose we have the following bit of code:
trait Hash: Eq { }
struct Set<K: Hash> { ... }
fn foo<K>(collection: Set<K>, x: K, y: K) {
// `x` and `y` can be equalized even if we did not explicitly write
// `where K: Eq`
if x == y {
...
}
}
In the foo
function, we assume that Set<K>
is well-formed, i.e. we have
FromEnv(Set<K>)
in our environment. Because of the previous rule, we get
FromEnv(K: Hash)
without needing an explicit where clause. And because
of the Hash
trait definition, there also exists a rule which says:
forall<K> {
FromEnv(K: Eq) :- FromEnv(K: Hash)
}
which means that we finally get FromEnv(K: Eq)
and then can compare x
and y
without needing an explicit where clause.
Lowering trait items
Associated type declarations
Given a trait that declares a (possibly generic) associated type:
trait Trait<P1..Pn> // P0 == Self
where WC
{
type AssocType<Pn+1..Pm>: Bounds where WC1;
}
We will produce a number of program clauses. The first two define
the rules by which ProjectionEq
can succeed; these two clauses are discussed
in detail in the section on associated types,
but reproduced here for reference:
// Rule ProjectionEq-Normalize
//
// ProjectionEq can succeed by normalizing:
forall<Self, P1..Pn, Pn+1..Pm, U> {
ProjectionEq(<Self as Trait<P1..Pn>>::AssocType<Pn+1..Pm> = U) :-
Normalize(<Self as Trait<P1..Pn>>::AssocType<Pn+1..Pm> -> U)
}
// Rule ProjectionEq-Placeholder
//
// ProjectionEq can succeed through the placeholder associated type,
// see "associated type" chapter for more:
forall<Self, P1..Pn, Pn+1..Pm> {
ProjectionEq(
<Self as Trait<P1..Pn>>::AssocType<Pn+1..Pm> =
(Trait::AssocType)<Self, P1..Pn, Pn+1..Pm>
)
}
The next rule covers implied bounds for the projection. In particular,
the Bounds
declared on the associated type must have been proven to hold
to show that the impl is well-formed, and hence we can rely on them
elsewhere.
// Rule Implied-Bound-From-AssocTy
//
// For each `Bound` in `Bounds`:
forall<Self, P1..Pn, Pn+1..Pm> {
FromEnv(<Self as Trait<P1..Pn>>::AssocType<Pn+1..Pm>>: Bound) :-
FromEnv(Self: Trait<P1..Pn>) && WC1
}
Next, we define the requirements for an instantiation of our associated type to be well-formed...
// Rule WellFormed-AssocTy
forall<Self, P1..Pn, Pn+1..Pm> {
WellFormed((Trait::AssocType)<Self, P1..Pn, Pn+1..Pm>) :-
Implemented(Self: Trait<P1..Pn>) && WC1
}
...along with the reverse implications, when we can assume that it is well-formed.
// Rule Implied-WC-From-AssocTy
//
// For each where clause WC1:
forall<Self, P1..Pn, Pn+1..Pm> {
FromEnv(WC1) :- FromEnv((Trait::AssocType)<Self, P1..Pn, Pn+1..Pm>)
}
// Rule Implied-Trait-From-AssocTy
forall<Self, P1..Pn, Pn+1..Pm> {
FromEnv(Self: Trait<P1..Pn>) :-
FromEnv((Trait::AssocType)<Self, P1..Pn, Pn+1..Pm>)
}
Lowering function and constant declarations
Chalk didn't model functions and constants, but I would eventually like to treat them exactly like normalization. See the section on function/constant values below for more details.
Lowering impls
Given an impl of a trait:
impl<P0..Pn> Trait<A1..An> for A0
where WC
{
// zero or more impl items
}
Let TraitRef
be the trait reference A0: Trait<A1..An>
. Then we
will create the following rules:
// Rule Implemented-From-Impl
forall<P0..Pn> {
Implemented(TraitRef) :- WC
}
In addition, we will lower all of the impl items.
Lowering impl items
Associated type values
Given an impl that contains:
impl<P0..Pn> Trait<P1..Pn> for P0
where WC_impl
{
type AssocType<Pn+1..Pm> = T;
}
and our where clause WC1
on the trait associated type from above, we
produce the following rule:
// Rule Normalize-From-Impl
forall<P0..Pm> {
forall<Pn+1..Pm> {
Normalize(<P0 as Trait<P1..Pn>>::AssocType<Pn+1..Pm> -> T) :-
Implemented(P0 as Trait) && WC1
}
}
Note that WC_impl
and WC1
both encode where-clauses that the impl can
rely on. (WC_impl
is not used here, because it is implied by
Implemented(P0 as Trait)
.)
Function and constant values
Chalk didn't model functions and constants, but I would eventually
like to treat them exactly like normalization. This presumably
involves adding a new kind of parameter (constant), and then having a
NormalizeValue
domain goal. This is to be written because the
details are a bit up in the air.
Well-formedness checking
WF checking has the job of checking that the various declarations in a Rust program are well-formed. This is the basis for implied bounds, and partly for that reason, this checking can be surprisingly subtle! For example, we have to be sure that each impl proves the WF conditions declared on the trait.
For each declaration in a Rust program, we will generate a logical goal and try to prove it using the lowered rules we described in the lowering rules chapter. If we are able to prove it, we say that the construct is well-formed. If not, we report an error to the user.
Well-formedness checking happens in the chalk/chalk-solve/src/wf.rs
module in chalk. After you have read this chapter, you may find useful to see
an extended set of examples in the chalk/src/test/wf.rs
submodule.
The new-style WF checking has not been implemented in rustc yet.
We give here a complete reference of the generated goals for each Rust declaration.
In addition to the notations introduced in the chapter about
lowering rules, we'll introduce another notation: when checking WF of a
declaration, we'll often have to prove that all types that appear are
well-formed, except type parameters that we always assume to be WF. Hence,
we'll use the following notation: for a type SomeType<...>
, we define
InputTypes(SomeType<...>)
to be the set of all non-parameter types appearing
in SomeType<...>
, including SomeType<...>
itself.
Examples:
InputTypes((u32, f32)) = [u32, f32, (u32, f32)]
InputTypes(Box<T>) = [Box<T>]
(assuming thatT
is a type parameter)InputTypes(Box<Box<T>>) = [Box<T>, Box<Box<T>>]
We also extend the InputTypes
notation to where clauses in the natural way.
So, for example InputTypes(A0: Trait<A1,...,An>)
is the union of
InputTypes(A0)
, InputTypes(A1)
, ..., InputTypes(An)
.
Type definitions
Given a general type definition:
struct Type<P...> where WC_type {
field1: A1,
...
fieldn: An,
}
we generate the following goal, which represents its well-formedness condition:
forall<P...> {
if (FromEnv(WC_type)) {
WellFormed(InputTypes(WC_type)) &&
WellFormed(InputTypes(A1)) &&
...
WellFormed(InputTypes(An))
}
}
which in English states: assuming that the where clauses defined on the type hold, prove that every type appearing in the type definition is well-formed.
Some examples:
struct OnlyClone<T> where T: Clone {
clonable: T,
}
// The only types appearing are type parameters: we have nothing to check,
// the type definition is well-formed.
struct Foo<T> where T: Clone {
foo: OnlyClone<T>,
}
// The only non-parameter type which appears in this definition is
// `OnlyClone<T>`. The generated goal is the following:
// ```
// forall<T> {
// if (FromEnv(T: Clone)) {
// WellFormed(OnlyClone<T>)
// }
// }
// ```
// which is provable.
struct Bar<T> where <T as Iterator>::Item: Debug {
bar: i32,
}
// The only non-parameter types which appear in this definition are
// `<T as Iterator>::Item` and `i32`. The generated goal is the following:
// ```
// forall<T> {
// if (FromEnv(<T as Iterator>::Item: Debug)) {
// WellFormed(<T as Iterator>::Item) &&
// WellFormed(i32)
// }
// }
// ```
// which is not provable since `WellFormed(<T as Iterator>::Item)` requires
// proving `Implemented(T: Iterator)`, and we are unable to prove that for an
// unknown `T`.
//
// Hence, this type definition is considered illegal. An additional
// `where T: Iterator` would make it legal.
Trait definitions
Given a general trait definition:
trait Trait<P1...> where WC_trait {
type Assoc<P2...>: Bounds_assoc where WC_assoc;
}
we generate the following goal:
forall<P1...> {
if (FromEnv(WC_trait)) {
WellFormed(InputTypes(WC_trait)) &&
forall<P2...> {
if (FromEnv(WC_assoc)) {
WellFormed(InputTypes(Bounds_assoc)) &&
WellFormed(InputTypes(WC_assoc))
}
}
}
}
There is not much to verify in a trait definition. We just want to prove that the types appearing in the trait definition are well-formed, under the assumption that the different where clauses hold.
Some examples:
trait Foo<T> where T: Iterator, <T as Iterator>::Item: Debug {
...
}
// The only non-parameter type which appears in this definition is
// `<T as Iterator>::Item`. The generated goal is the following:
// ```
// forall<T> {
// if (FromEnv(T: Iterator), FromEnv(<T as Iterator>::Item: Debug)) {
// WellFormed(<T as Iterator>::Item)
// }
// }
// ```
// which is provable thanks to the `FromEnv(T: Iterator)` assumption.
trait Bar {
type Assoc<T>: From<<T as Iterator>::Item>;
}
// The only non-parameter type which appears in this definition is
// `<T as Iterator>::Item`. The generated goal is the following:
// ```
// forall<T> {
// WellFormed(<T as Iterator>::Item)
// }
// ```
// which is not provable, hence the trait definition is considered illegal.
trait Baz {
type Assoc<T>: From<<T as Iterator>::Item> where T: Iterator;
}
// The generated goal is now:
// ```
// forall<T> {
// if (FromEnv(T: Iterator)) {
// WellFormed(<T as Iterator>::Item)
// }
// }
// ```
// which is now provable.
Impls
Now we give ourselves a general impl for the trait defined above:
impl<P1...> Trait<A1...> for SomeType<A2...> where WC_impl {
type Assoc<P2...> = SomeValue<A3...> where WC_assoc;
}
Note that here, WC_assoc
are the same where clauses as those defined on the
associated type definition in the trait declaration, except that type
parameters from the trait are substituted with values provided by the impl
(see example below). You cannot add new where clauses. You may omit to write
the where clauses if you want to emphasize the fact that you are actually not
relying on them.
Some examples to illustrate that:
trait Foo<T> {
type Assoc where T: Clone;
}
struct OnlyClone<T: Clone> { ... }
impl<U> Foo<Option<U>> for () {
// We substitute type parameters from the trait by the ones provided
// by the impl, that is instead of having a `T: Clone` where clause,
// we have an `Option<U>: Clone` one.
type Assoc = OnlyClone<Option<U>> where Option<U>: Clone;
}
impl<T> Foo<T> for i32 {
// I'm not using the `T: Clone` where clause from the trait, so I can
// omit it.
type Assoc = u32;
}
impl<T> Foo<T> for f32 {
type Assoc = OnlyClone<Option<T>> where Option<T>: Clone;
// ^^^^^^^^^^^^^^^^^^^^^^
// this where clause does not exist
// on the original trait decl: illegal
}
So in Rust, where clauses on associated types work exactly like where clauses on trait methods: in an impl, we must substitute the parameters from the traits with values provided by the impl, we may omit them if we don't need them, but we cannot add new where clauses.
Now let's see the generated goal for this general impl:
forall<P1...> {
// Well-formedness of types appearing in the impl
if (FromEnv(WC_impl), FromEnv(InputTypes(SomeType<A2...>: Trait<A1...>))) {
WellFormed(InputTypes(WC_impl)) &&
forall<P2...> {
if (FromEnv(WC_assoc)) {
WellFormed(InputTypes(SomeValue<A3...>))
}
}
}
// Implied bounds checking
if (FromEnv(WC_impl), FromEnv(InputTypes(SomeType<A2...>: Trait<A1...>))) {
WellFormed(SomeType<A2...>: Trait<A1...>) &&
forall<P2...> {
if (FromEnv(WC_assoc)) {
WellFormed(SomeValue<A3...>: Bounds_assoc)
}
}
}
}
Here is the most complex goal. As always, first, assuming that
the various where clauses hold, we prove that every type appearing in the impl
is well-formed, except types appearing in the impl header
SomeType<A2...>: Trait<A1...>
. Instead, we assume that those types are
well-formed
(hence the if (FromEnv(InputTypes(SomeType<A2...>: Trait<A1...>)))
conditions). This is
part of the implied bounds proposal, so that we can rely on the bounds
written on the definition of e.g. the SomeType<A2...>
type (and that we don't
need to repeat those bounds).
Note that we don't need to check well-formedness of types appearing in
WC_assoc
because we already did that in the trait decl (they are just repeated with some substitutions of values which we already assume to be well-formed)
Next, still assuming that the where clauses on the impl WC_impl
hold and that
the input types of SomeType<A2...>
are well-formed, we prove that
WellFormed(SomeType<A2...>: Trait<A1...>)
hold. That is, we want to prove
that SomeType<A2...>
verify all the where clauses that might transitively
be required by the Trait
definition (see
this subsection).
Lastly, assuming in addition that the where clauses on the associated type
WC_assoc
hold,
we prove that WellFormed(SomeValue<A3...>: Bounds_assoc)
hold. Again, we are
not only proving Implemented(SomeValue<A3...>: Bounds_assoc)
, but also
all the facts that might transitively come from Bounds_assoc
. We must do this
because we allow the use of implied bounds on associated types: if we have
FromEnv(SomeType: Trait)
in our environment, the lowering rules
chapter indicates that we are able to deduce
FromEnv(<SomeType as Trait>::Assoc: Bounds_assoc)
without knowing what the
precise value of <SomeType as Trait>::Assoc
is.
Some examples for the generated goal:
// Trait Program Clauses
// These are program clauses that come from the trait definitions below
// and that the trait solver can use for its reasonings. I'm just restating
// them here so that we have them in mind.
trait Copy { }
// This is a program clause that comes from the trait definition above
// and that the trait solver can use for its reasonings. I'm just restating
// it here (and also the few other ones coming just after) so that we have
// them in mind.
// `WellFormed(Self: Copy) :- Implemented(Self: Copy).`
trait Partial where Self: Copy { }
// ```
// WellFormed(Self: Partial) :-
// Implemented(Self: Partial) &&
// WellFormed(Self: Copy).
// ```
trait Complete where Self: Partial { }
// ```
// WellFormed(Self: Complete) :-
// Implemented(Self: Complete) &&
// WellFormed(Self: Partial).
// ```
// Impl WF Goals
impl<T> Partial for T where T: Complete { }
// The generated goal is:
// ```
// forall<T> {
// if (FromEnv(T: Complete)) {
// WellFormed(T: Partial)
// }
// }
// ```
// Then proving `WellFormed(T: Partial)` amounts to proving
// `Implemented(T: Partial)` and `Implemented(T: Copy)`.
// Both those facts can be deduced from the `FromEnv(T: Complete)` in our
// environment: this impl is legal.
impl<T> Complete for T { }
// The generated goal is:
// ```
// forall<T> {
// WellFormed(T: Complete)
// }
// ```
// Then proving `WellFormed(T: Complete)` amounts to proving
// `Implemented(T: Complete)`, `Implemented(T: Partial)` and
// `Implemented(T: Copy)`.
//
// `Implemented(T: Complete)` can be proved thanks to the
// `impl<T> Complete for T` blanket impl.
//
// `Implemented(T: Partial)` can be proved thanks to the
// `impl<T> Partial for T where T: Complete` impl and because we know
// `T: Complete` holds.
// However, `Implemented(T: Copy)` cannot be proved: the impl is illegal.
// An additional `where T: Copy` bound would be sufficient to make that impl
// legal.
trait Bar { }
impl<T> Bar for T where <T as Iterator>::Item: Bar { }
// We have a non-parameter type appearing in the where clauses:
// `<T as Iterator>::Item`. The generated goal is:
// ```
// forall<T> {
// if (FromEnv(<T as Iterator>::Item: Bar)) {
// WellFormed(T: Bar) &&
// WellFormed(<T as Iterator>::Item: Bar)
// }
// }
// ```
// And `WellFormed(<T as Iterator>::Item: Bar)` is not provable: we'd need
// an additional `where T: Iterator` for example.
trait Foo { }
trait Bar {
type Item: Foo;
}
struct Stuff<T> { }
impl<T> Bar for Stuff<T> where T: Foo {
type Item = T;
}
// The generated goal is:
// ```
// forall<T> {
// if (FromEnv(T: Foo)) {
// WellFormed(T: Foo).
// }
// }
// ```
// which is provable.
trait Debug { ... }
// `WellFormed(Self: Debug) :- Implemented(Self: Debug).`
struct Box<T> { ... }
impl<T> Debug for Box<T> where T: Debug { ... }
trait PointerFamily {
type Pointer<T>: Debug where T: Debug;
}
// `WellFormed(Self: PointerFamily) :- Implemented(Self: PointerFamily).`
struct BoxFamily;
impl PointerFamily for BoxFamily {
type Pointer<T> = Box<T> where T: Debug;
}
// The generated goal is:
// ```
// forall<T> {
// WellFormed(BoxFamily: PointerFamily) &&
//
// if (FromEnv(T: Debug)) {
// WellFormed(Box<T>: Debug) &&
// WellFormed(Box<T>)
// }
// }
// ```
// `WellFormed(BoxFamily: PointerFamily)` amounts to proving
// `Implemented(BoxFamily: PointerFamily)`, which is ok thanks to our impl.
//
// `WellFormed(Box<T>)` is always true (there are no where clauses on the
// `Box` type definition).
//
// Moreover, we have an `impl<T: Debug> Debug for Box<T>`, hence
// we can prove `WellFormed(Box<T>: Debug)` and the impl is indeed legal.
trait Foo {
type Assoc<T>;
}
struct OnlyClone<T: Clone> { ... }
impl Foo for i32 {
type Assoc<T> = OnlyClone<T>;
}
// The generated goal is:
// ```
// forall<T> {
// WellFormed(i32: Foo) &&
// WellFormed(OnlyClone<T>)
// }
// ```
// however `WellFormed(OnlyClone<T>)` is not provable because it requires
// `Implemented(T: Clone)`. It would be tempting to just add a `where T: Clone`
// bound inside the `impl Foo for i32` block, however we saw that it was
// illegal to add where clauses that didn't come from the trait definition.
Canonical queries
The "start" of the trait system is the canonical query (these are
both queries in the more general sense of the word – something you
would like to know the answer to – and in the
rustc-specific sense). The idea is that the type
checker or other parts of the system, may in the course of doing their
thing want to know whether some trait is implemented for some type
(e.g., is u32: Debug
true?). Or they may want to
normalize some associated type.
This section covers queries at a fairly high level of abstraction. The subsections look a bit more closely at how these ideas are implemented in rustc.
The traditional, interactive Prolog query
In a traditional Prolog system, when you start a query, the solver will run off and start supplying you with every possible answer it can find. So given something like this:
?- Vec<i32>: AsRef<?U>
The solver might answer:
Vec<i32>: AsRef<[i32]>
continue? (y/n)
This continue
bit is interesting. The idea in Prolog is that the
solver is finding all possible instantiations of your query that
are true. In this case, if we instantiate ?U = [i32]
, then the query
is true (note that a traditional Prolog interface does not, directly,
tell us a value for ?U
, but we can infer one by unifying the
response with our original query – Rust's solver gives back a
substitution instead). If we were to hit y
, the solver might then
give us another possible answer:
Vec<i32>: AsRef<Vec<i32>>
continue? (y/n)
This answer derives from the fact that there is a reflexive impl
(impl<T> AsRef<T> for T
) for AsRef
. If were to hit y
again,
then we might get back a negative response:
no
Naturally, in some cases, there may be no possible answers, and hence
the solver will just give me back no
right away:
?- Box<i32>: Copy
no
In some cases, there might be an infinite number of responses. So for
example if I gave this query, and I kept hitting y
, then the solver
would never stop giving me back answers:
?- Vec<?U>: Clone
Vec<i32>: Clone
continue? (y/n)
Vec<Box<i32>>: Clone
continue? (y/n)
Vec<Box<Box<i32>>>: Clone
continue? (y/n)
Vec<Box<Box<Box<i32>>>>: Clone
continue? (y/n)
As you can imagine, the solver will gleefully keep adding another
layer of Box
until we ask it to stop, or it runs out of memory.
Another interesting thing is that queries might still have variables in them. For example:
?- Rc<?T>: Clone
might produce the answer:
Rc<?T>: Clone
continue? (y/n)
After all, Rc<?T>
is true no matter what type ?T
is.
A trait query in rustc
The trait queries in rustc work somewhat differently. Instead of trying to enumerate all possible answers for you, they are looking for an unambiguous answer. In particular, when they tell you the value for a type variable, that means that this is the only possible instantiation that you could use, given the current set of impls and where-clauses, that would be provable. (Internally within the solver, though, they can potentially enumerate all possible answers. See the description of the SLG solver for details.)
The response to a trait query in rustc is typically a
Result<QueryResult<T>, NoSolution>
(where the T
will vary a bit
depending on the query itself). The Err(NoSolution)
case indicates
that the query was false and had no answers (e.g., Box<i32>: Copy
).
Otherwise, the QueryResult
gives back information about the possible answer(s)
we did find. It consists of four parts:
- Certainty: tells you how sure we are of this answer. It can have two
values:
Proven
means that the result is known to be true.- This might be the result for trying to prove
Vec<i32>: Clone
, say, orRc<?T>: Clone
.
- This might be the result for trying to prove
Ambiguous
means that there were things we could not yet prove to be either true or false, typically because more type information was needed. (We'll see an example shortly.)- This might be the result for trying to prove
Vec<?T>: Clone
.
- This might be the result for trying to prove
- Var values: Values for each of the unbound inference variables
(like
?T
) that appeared in your original query. (Remember that in Prolog, we had to infer these.)- As we'll see in the example below, we can get back var values even
for
Ambiguous
cases.
- As we'll see in the example below, we can get back var values even
for
- Region constraints: these are relations that must hold between the lifetimes that you supplied as inputs. We'll ignore these here, but see the section on handling regions in traits for more details.
- Value: The query result also comes with a value of type
T
. For some specialized queries – like normalizing associated types – this is used to carry back an extra result, but it's often just()
.
Examples
Let's work through an example query to see what all the parts mean.
Consider the Borrow
trait. This trait has a number of
impls; among them, there are these two (for clarity, I've written the
Sized
bounds explicitly):
impl<T> Borrow<T> for T where T: ?Sized
impl<T> Borrow<[T]> for Vec<T> where T: Sized
Example 1. Imagine we are type-checking this (rather artificial) bit of code:
fn foo<A, B>(a: A, vec_b: Option<B>) where A: Borrow<B> { }
fn main() {
let mut t: Vec<_> = vec![]; // Type: Vec<?T>
let mut u: Option<_> = None; // Type: Option<?U>
foo(t, u); // Example 1: requires `Vec<?T>: Borrow<?U>`
...
}
As the comments indicate, we first create two variables t
and u
;
t
is an empty vector and u
is a None
option. Both of these
variables have unbound inference variables in their type: ?T
represents the elements in the vector t
and ?U
represents the
value stored in the option u
. Next, we invoke foo
; comparing the
signature of foo
to its arguments, we wind up with A = Vec<?T>
and
B = ?U
. Therefore, the where clause on foo
requires that Vec<?T>: Borrow<?U>
. This is thus our first example trait query.
There are many possible solutions to the query Vec<?T>: Borrow<?U>
;
for example:
?U = Vec<?T>
,?U = [?T]
,?T = u32, ?U = [u32]
- and so forth.
Therefore, the result we get back would be as follows (I'm going to ignore region constraints and the "value"):
- Certainty:
Ambiguous
– we're not sure yet if this holds - Var values:
[?T = ?T, ?U = ?U]
– we learned nothing about the values of the variables
In short, the query result says that it is too soon to say much about
whether this trait is proven. During type-checking, this is not an
immediate error: instead, the type checker would hold on to this
requirement (Vec<?T>: Borrow<?U>
) and wait. As we'll see in the next
example, it may happen that ?T
and ?U
wind up constrained from
other sources, in which case we can try the trait query again.
Example 2. We can now extend our previous example a bit,
and assign a value to u
:
fn foo<A, B>(a: A, vec_b: Option<B>) where A: Borrow<B> { }
fn main() {
// What we saw before:
let mut t: Vec<_> = vec![]; // Type: Vec<?T>
let mut u: Option<_> = None; // Type: Option<?U>
foo(t, u); // `Vec<?T>: Borrow<?U>` => ambiguous
// New stuff:
u = Some(vec![]); // ?U = Vec<?V>
}
As a result of this assignment, the type of u
is forced to be
Option<Vec<?V>>
, where ?V
represents the element type of the
vector. This in turn implies that ?U
is unified to Vec<?V>
.
Let's suppose that the type checker decides to revisit the
"as-yet-unproven" trait obligation we saw before, Vec<?T>: Borrow<?U>
. ?U
is no longer an unbound inference variable; it now
has a value, Vec<?V>
. So, if we "refresh" the query with that value, we get:
Vec<?T>: Borrow<Vec<?V>>
This time, there is only one impl that applies, the reflexive impl:
impl<T> Borrow<T> for T where T: ?Sized
Therefore, the trait checker will answer:
- Certainty:
Proven
- Var values:
[?T = ?T, ?V = ?T]
Here, it is saying that we have indeed proven that the obligation
holds, and we also know that ?T
and ?V
are the same type (but we
don't know what that type is yet!).
(In fact, as the function ends here, the type checker would give an
error at this point, since the element types of t
and u
are still
not yet known, even though they are known to be the same.)
Canonicalization
Canonicalization is the process of isolating an inference value from its context. It is a key part of implementing canonical queries, and you may wish to read the parent chapter to get more context.
Canonicalization is really based on a very simple concept: every inference variable is always in one of two states: either it is unbound, in which case we don't know yet what type it is, or it is bound, in which case we do. So to isolate some data-structure T that contains types/regions from its environment, we just walk down and find the unbound variables that appear in T; those variables get replaced with "canonical variables", starting from zero and numbered in a fixed order (left to right, for the most part, but really it doesn't matter as long as it is consistent).
So, for example, if we have the type X = (?T, ?U)
, where ?T
and
?U
are distinct, unbound inference variables, then the canonical
form of X
would be (?0, ?1)
, where ?0
and ?1
represent these
canonical placeholders. Note that the type Y = (?U, ?T)
also
canonicalizes to (?0, ?1)
. But the type Z = (?T, ?T)
would
canonicalize to (?0, ?0)
(as would (?U, ?U)
). In other words, the
exact identity of the inference variables is not important – unless
they are repeated.
We use this to improve caching as well as to detect cycles and other
things during trait resolution. Roughly speaking, the idea is that if
two trait queries have the same canonical form, then they will get
the same answer. That answer will be expressed in terms of the
canonical variables (?0
, ?1
), which we can then map back to the
original variables (?T
, ?U
).
Canonicalizing the query
To see how it works, imagine that we are asking to solve the following
trait query: ?A: Foo<'static, ?B>
, where ?A
and ?B
are unbound.
This query contains two unbound variables, but it also contains the
lifetime 'static
. The trait system generally ignores all lifetimes
and treats them equally, so when canonicalizing, we will also
replace any free lifetime with a
canonical variable (Note that 'static
is actually a free lifetime
variable here. We are not considering it in the typing context of the whole
program but only in the context of this trait reference. Mathematically, we
are not quantifying over the whole program, but only this obligation).
Therefore, we get the following result:
?0: Foo<'?1, ?2>
Sometimes we write this differently, like so:
for<T,L,T> { ?0: Foo<'?1, ?2> }
This for<>
gives some information about each of the canonical
variables within. In this case, each T
indicates a type variable,
so ?0
and ?2
are types; the L
indicates a lifetime variable, so
?1
is a lifetime. The canonicalize
method also gives back a
CanonicalVarValues
array OV with the "original values" for each
canonicalized variable:
[?A, 'static, ?B]
We'll need this vector OV later, when we process the query response.
Executing the query
Once we've constructed the canonical query, we can try to solve it. To do so, we will wind up creating a fresh inference context and instantiating the canonical query in that context. The idea is that we create a substitution S from the canonical form containing a fresh inference variable (of suitable kind) for each canonical variable. So, for our example query:
for<T,L,T> { ?0: Foo<'?1, ?2> }
the substitution S might be:
S = [?A, '?B, ?C]
We can then replace the bound canonical variables (?0
, etc) with
these inference variables, yielding the following fully instantiated
query:
?A: Foo<'?B, ?C>
Remember that substitution S though! We're going to need it later.
OK, now that we have a fresh inference context and an instantiated
query, we can go ahead and try to solve it. The trait solver itself is
explained in more detail in another section, but
suffice to say that it will compute a certainty value (Proven
or
Ambiguous
) and have side-effects on the inference variables we've
created. For example, if there were only one impl of Foo
, like so:
impl<'a, X> Foo<'a, X> for Vec<X>
where X: 'a
{ ... }
then we might wind up with a certainty value of Proven
, as well as
creating fresh inference variables '?D
and ?E
(to represent the
parameters on the impl) and unifying as follows:
'?B = '?D
?A = Vec<?E>
?C = ?E
We would also accumulate the region constraint ?E: '?D
, due to the
where clause.
In order to create our final query result, we have to "lift" these values out of the query's inference context and into something that can be reapplied in our original inference context. We do that by re-applying canonicalization, but to the query result.
Canonicalizing the query result
As discussed in the parent section, most trait queries wind up
with a result that brings together a "certainty value" certainty
, a
result substitution var_values
, and some region constraints. To
create this, we wind up re-using the substitution S that we created
when first instantiating our query. To refresh your memory, we had a query
for<T,L,T> { ?0: Foo<'?1, ?2> }
for which we made a substutition S:
S = [?A, '?B, ?C]
We then did some work which unified some of those variables with other things. If we "refresh" S with the latest results, we get:
S = [Vec<?E>, '?D, ?E]
These are precisely the new values for the three input variables from
our original query. Note though that they include some new variables
(like ?E
). We can make those go away by canonicalizing again! We don't
just canonicalize S, though, we canonicalize the whole query response QR:
QR = {
certainty: Proven, // or whatever
var_values: [Vec<?E>, '?D, ?E] // this is S
region_constraints: [?E: '?D], // from the impl
value: (), // for our purposes, just (), but
// in some cases this might have
// a type or other info
}
The result would be as follows:
Canonical(QR) = for<T, L> {
certainty: Proven,
var_values: [Vec<?0>, '?1, ?0]
region_constraints: [?0: '?1],
value: (),
}
(One subtle point: when we canonicalize the query result, we do not
use any special treatment for free lifetimes. Note that both
references to '?D
, for example, were converted into the same
canonical variable (?1
). This is in contrast to the original query,
where we canonicalized every free lifetime into a fresh canonical
variable.)
Now, this result must be reapplied in each context where needed.
Processing the canonicalized query result
In the previous section we produced a canonical query result. We now have to apply that result in our original context. If you recall, way back in the beginning, we were trying to prove this query:
?A: Foo<'static, ?B>
We canonicalized that into this:
for<T,L,T> { ?0: Foo<'?1, ?2> }
and now we got back a canonical response:
for<T, L> {
certainty: Proven,
var_values: [Vec<?0>, '?1, ?0]
region_constraints: [?0: '?1],
value: (),
}
We now want to apply that response to our context. Conceptually, how we do that is to (a) instantiate each of the canonical variables in the result with a fresh inference variable, (b) unify the values in the result with the original values, and then (c) record the region constraints for later. Doing step (a) would yield a result of
{
certainty: Proven,
var_values: [Vec<?C>, '?D, ?C]
^^ ^^^ fresh inference variables
region_constraints: [?C: '?D],
value: (),
}
Step (b) would then unify:
?A with Vec<?C>
'static with '?D
?B with ?C
And finally the region constraint of ?C: 'static
would be recorded
for later verification.
(What we actually do is a mildly optimized variant of that: Rather
than eagerly instantiating all of the canonical values in the result
with variables, we instead walk the vector of values, looking for
cases where the value is just a canonical variable. In our example,
values[2]
is ?C
, so that means we can deduce that ?C := ?B
and
'?D := 'static
. This gives us a partial set of values. Anything for
which we do not find a value, we create an inference variable.)
The On-Demand SLG solver
Given a set of program clauses (provided by our lowering rules) and a query, we need to return the result of the query and the value of any type variables we can determine. This is the job of the solver.
For example, exists<T> { Vec<T>: FromIterator<u32> }
has one solution, so
its result is Unique; substitution [?T := u32]
. A solution also comes with
a set of region constraints, which we'll ignore in this introduction.
Goals of the Solver
On demand
There are often many, or even infinitely many, solutions to a query. For
example, say we want to prove that exists<T> { Vec<T>: Debug }
for some
type ?T
. Our solver should be capable of yielding one answer at a time, say
?T = u32
, then ?T = i32
, and so on, rather than iterating over every type
in the type system. If we need more answers, we can request more until we are
done. This is similar to how Prolog works.
See also: The traditional, interactive Prolog query
Breadth-first
Vec<?T>: Debug
is true if ?T: Debug
. This leads to a cycle: [Vec<u32>, Vec<Vec<u32>>, Vec<Vec<Vec<u32>>>]
, and so on all implement Debug
. Our
solver ought to be breadth first and consider answers like [Vec<u32>: Debug, Vec<i32>: Debug, ...]
before it recurses, or we may never find the answer
we're looking for.
Cachable
To speed up compilation, we need to cache results, including partial results left over from past solver queries.
Description of how it works
The basis of the solver is the Forest
type. A forest stores a
collection of tables as well as a stack. Each table represents
the stored results of a particular query that is being performed, as
well as the various strands, which are basically suspended
computations that may be used to find more answers. Tables are
interdependent: solving one query may require solving others.
Walkthrough
Perhaps the easiest way to explain how the solver works is to walk through an example. Let's imagine that we have the following program:
trait Debug { }
struct u32 { }
impl Debug for u32 { }
struct Rc<T> { }
impl<T: Debug> Debug for Rc<T> { }
struct Vec<T> { }
impl<T: Debug> Debug for Vec<T> { }
Now imagine that we want to find answers for the query exists<T> { Rc<T>: Debug }
. The first step would be to u-canonicalize this query; this is the
act of giving canonical names to all the unbound inference variables based on
the order of their left-most appearance, as well as canonicalizing the
universes of any universally bound names (e.g., the T
in forall<T> { ... }
). In this case, there are no universally bound names, but the canonical
form Q of the query might look something like:
Rc<?0>: Debug
where ?0
is a variable in the root universe U0. We would then go and
look for a table with this canonical query as the key: since the forest is
empty, this lookup will fail, and we will create a new table T0,
corresponding to the u-canonical goal Q.
Ignoring negative reasoning and regions. To start, we'll ignore
the possibility of negative goals like not { Foo }
. We'll phase them
in later, as they bring several complications.
Creating a table. When we first create a table, we also initialize
it with a set of initial strands. A "strand" is kind of like a
"thread" for the solver: it contains a particular way to produce an
answer. The initial set of strands for a goal like Rc<?0>: Debug
(i.e., a "domain goal") is determined by looking for clauses in the
environment. In Rust, these clauses derive from impls, but also from
where-clauses that are in scope. In the case of our example, there
would be three clauses, each coming from the program. Using a
Prolog-like notation, these look like:
(u32: Debug).
(Rc<T>: Debug) :- (T: Debug).
(Vec<T>: Debug) :- (T: Debug).
To create our initial strands, then, we will try to apply each of
these clauses to our goal of Rc<?0>: Debug
. The first and third
clauses are inapplicable because u32
and Vec<?0>
cannot be unified
with Rc<?0>
. The second clause, however, will work.
What is a strand? Let's talk a bit more about what a strand is. In the code, a strand
is the combination of an inference table, an X-clause, and (possibly)
a selected subgoal from that X-clause. But what is an X-clause
(ExClause
, in the code)? An X-clause pulls together a few things:
- The current state of the goal we are trying to prove;
- A set of subgoals that have yet to be proven;
- There are also a few things we're ignoring for now:
- delayed literals, region constraints
The general form of an X-clause is written much like a Prolog clause, but with somewhat different semantics. Since we're ignoring delayed literals and region constraints, an X-clause just looks like this:
G :- L
where G is a goal and L is a set of subgoals that must be proven. (The L stands for literal -- when we address negative reasoning, a literal will be either a positive or negative subgoal.) The idea is that if we are able to prove L then the goal G can be considered true.
In the case of our example, we would wind up creating one strand, with an X-clause like so:
(Rc<?T>: Debug) :- (?T: Debug)
Here, the ?T
refers to one of the inference variables created in the
inference table that accompanies the strand. (I'll use named variables
to refer to inference variables, and numbered variables like ?0
to
refer to variables in a canonicalized goal; in the code, however, they
are both represented with an index.)
For each strand, we also optionally store a selected subgoal. This
is the subgoal after the turnstile (:-
) that we are currently trying
to prove in this strand. Initially, when a strand is first created,
there is no selected subgoal.
Activating a strand. Now that we have created the table T0 and
initialized it with strands, we have to actually try and produce an answer.
We do this by invoking the ensure_root_answer
operation on the table:
specifically, we say ensure_root_answer(T0, A0)
, meaning "ensure that there
is a 0th answer A0 to query T0".
Remember that tables store not only strands, but also a vector of cached
answers. The first thing that ensure_root_answer
does is to check whether
answer A0 is in this vector. If so, we can just return immediately. In this
case, the vector will be empty, and hence that does not apply (this becomes
important for cyclic checks later on).
When there is no cached answer, ensure_root_answer
will try to produce one.
It does this by selecting a strand from the set of active strands -- the
strands are stored in a VecDeque
and hence processed in a round-robin
fashion. Right now, we have only one strand, storing the following X-clause
with no selected subgoal:
(Rc<?T>: Debug) :- (?T: Debug)
When we activate the strand, we see that we have no selected subgoal,
and so we first pick one of the subgoals to process. Here, there is only
one (?T: Debug
), so that becomes the selected subgoal, changing
the state of the strand to:
(Rc<?T>: Debug) :- selected(?T: Debug, A0)
Here, we write selected(L, An)
to indicate that (a) the literal L
is the selected subgoal and (b) which answer An
we are looking for. We
start out looking for A0
.
Processing the selected subgoal. Next, we have to try and find an
answer to this selected goal. To do that, we will u-canonicalize it
and try to find an associated table. In this case, the u-canonical
form of the subgoal is ?0: Debug
: we don't have a table yet for
that, so we can create a new one, T1. As before, we'll initialize T1
with strands. In this case, there will be three strands, because all
the program clauses are potentially applicable. Those three strands
will be:
(u32: Debug) :-
, derived from the program clause(u32: Debug).
.- Note: This strand has no subgoals.
(Vec<?U>: Debug) :- (?U: Debug)
, derived from theVec
impl.(Rc<?U>: Debug) :- (?U: Debug)
, derived from theRc
impl.
We can thus summarize the state of the whole forest at this point as follows:
Table T0 [Rc<?0>: Debug]
Strands:
(Rc<?T>: Debug) :- selected(?T: Debug, A0)
Table T1 [?0: Debug]
Strands:
(u32: Debug) :-
(Vec<?U>: Debug) :- (?U: Debug)
(Rc<?V>: Debug) :- (?V: Debug)
Delegation between tables. Now that the active strand from T0 has
created the table T1, it can try to extract an answer. It does this
via that same ensure_answer
operation we saw before. In this case,
the strand would invoke ensure_answer(T1, A0)
, since we will start
with the first answer. This will cause T1 to activate its first
strand, u32: Debug :-
.
This strand is somewhat special: it has no subgoals at all. This means
that the goal is proven. We can therefore add u32: Debug
to the set
of answers for our table, calling it answer A0 (it is the first
answer). The strand is then removed from the list of strands.
The state of table T1 is therefore:
Table T1 [?0: Debug]
Answers:
A0 = [?0 = u32]
Strand:
(Vec<?U>: Debug) :- (?U: Debug)
(Rc<?V>: Debug) :- (?V: Debug)
Note that I am writing out the answer A0 as a substitution that can be applied to the table goal; actually, in the code, the goals for each X-clause are also represented as substitutions, but in this exposition I've chosen to write them as full goals, following NFTD.
Since we now have an answer, ensure_answer(T1, A0)
will return Ok
to the table T0, indicating that answer A0 is available. T0 now has
the job of incorporating that result into its active strand. It does
this in two ways. First, it creates a new strand that is looking for
the next possible answer of T1. Next, it incorpoates the answer from
A0 and removes the subgoal. The resulting state of table T0 is:
Table T0 [Rc<?0>: Debug]
Strands:
(Rc<?T>: Debug) :- selected(?T: Debug, A1)
(Rc<u32>: Debug) :-
We then immediately activate the strand that incorporated the answer
(the Rc<u32>: Debug
one). In this case, that strand has no further
subgoals, so it becomes an answer to the table T0. This answer can
then be returned up to our caller, and the whole forest goes quiescent
at this point (remember, we only do enough work to generate one
answer). The ending state of the forest at this point will be:
Table T0 [Rc<?0>: Debug]
Answer:
A0 = [?0 = u32]
Strands:
(Rc<?T>: Debug) :- selected(?T: Debug, A1)
Table T1 [?0: Debug]
Answers:
A0 = [?0 = u32]
Strand:
(Vec<?U>: Debug) :- (?U: Debug)
(Rc<?V>: Debug) :- (?V: Debug)
Here you can see how the forest captures both the answers we have created thus far and the strands that will let us try to produce more answers later on.
See also
- chalk_solve README, which contains links to papers used and acronyms referenced in the code
- This section is a lightly adapted version of the blog post An on-demand SLG solver for chalk
- Negative Reasoning in Chalk explains the need for negative reasoning, but not how the SLG solver does it
An Overview of Chalk
Chalk is under heavy development, so if any of these links are broken or if any of the information is inconsistent with the code or outdated, please open an issue so we can fix it. If you are able to fix the issue yourself, we would love your contribution!
Chalk recasts Rust's trait system explicitly in terms of logic programming by "lowering" Rust code into a kind of logic program we can then execute queries against. (See Lowering to Logic and Lowering Rules) Its goal is to be an executable, highly readable specification of the Rust trait system.
There are many expected benefits from this work. It will consolidate our existing, somewhat ad-hoc implementation into something far more principled and expressive, which should behave better in corner cases, and be much easier to extend.
Chalk Structure
Chalk has two main "products". The first of these is the
chalk_engine
crate, which defines the core SLG
solver. This is the part rustc uses.
The rest of chalk can be considered an elaborate testing harness. Chalk is capable of parsing Rust-like "programs", lowering them to logic, and performing queries on them.
Here's a sample session in the chalk repl, chalki. After feeding it our program, we perform some queries on it.
?- program
Enter a program; press Ctrl-D when finished
| struct Foo { }
| struct Bar { }
| struct Vec<T> { }
| trait Clone { }
| impl<T> Clone for Vec<T> where T: Clone { }
| impl Clone for Foo { }
?- Vec<Foo>: Clone
Unique; substitution [], lifetime constraints []
?- Vec<Bar>: Clone
No possible solution.
?- exists<T> { Vec<T>: Clone }
Ambiguous; no inference guidance
You can see more examples of programs and queries in the unit tests.
Next we'll go through each stage required to produce the output above.
Parsing (chalk_parse)
Chalk is designed to be incorporated with the Rust compiler, so the syntax and concepts it deals with heavily borrow from Rust. It is convenient for the sake of testing to be able to run chalk on its own, so chalk includes a parser for a Rust-like syntax. This syntax is orthogonal to the Rust AST and grammar. It is not intended to look exactly like it or support the exact same syntax.
The parser takes that syntax and produces an Abstract Syntax Tree (AST). You can find the complete definition of the AST in the source code.
The syntax contains things from Rust that we know and love, for example: traits, impls, and struct definitions. Parsing is often the first "phase" of transformation that a program goes through in order to become a format that chalk can understand.
Rust Intermediate Representation (chalk_rust_ir)
After getting the AST we convert it to a more convenient intermediate
representation called chalk_rust_ir
. This is sort of
analogous to the HIR in Rust. The process of converting to IR is called
lowering.
The chalk::program::Program
struct contains some "rust things"
but indexed and accessible in a different way. For example, if you have a
type like Foo<Bar>
, we would represent Foo
as a string in the AST but in
chalk::program::Program
, we use numeric indices (ItemId
).
The IR source code contains the complete definition.
Chalk Intermediate Representation (chalk_ir)
Once we have Rust IR it is time to convert it to "program clauses". A
ProgramClause
is essentially one of the following:
- A clause of the form
consequence :- conditions
where:-
is read as "if" andconditions = cond1 && cond2 && ...
- A universally quantified clause of the form
forall<T> { consequence :- conditions }
forall<T> { ... }
is used to represent universal quantification. See the section on Lowering to logic for more information.- A key thing to note about
forall
is that we don't allow you to "quantify" over traits, only types and regions (lifetimes). That is, you can't make a rule likeforall<Trait> { u32: Trait }
which would say "u32
implements all traits". You can however sayforall<T> { T: Trait }
meaning "Trait
is implemented by all types". forall<T> { ... }
is represented in the code using theBinders<T>
struct.
See also: Goals and Clauses
This is where we encode the rules of the trait system into logic. For example, if we have the following Rust:
impl<T: Clone> Clone for Vec<T> {}
We generate the following program clause:
forall<T> { (Vec<T>: Clone) :- (T: Clone) }
This rule dictates that Vec<T>: Clone
is only satisfied if T: Clone
is also
satisfied (i.e. "provable").
Similar to chalk::program::Program
which has "rust-like
things", chalk_ir defines ProgramEnvironment
which is "pure logic".
The main field in that struct is program_clauses
, which contains the
ProgramClause
s generated by the rules module.
Rules (chalk_solve)
The chalk_solve
crate (source code) defines the logic rules we
use for each item in the Rust IR. It works by iterating over every trait, impl,
etc. and emitting the rules that come from each one.
See also: Lowering Rules
Well-formedness checks
As part of lowering to logic, we also do some "well formedness" checks. See
the chalk_solve::wf
source code for where those are done.
See also: Well-formedness checking
Coherence
The method CoherenceSolver::specialization_priorities
in the coherence
module
(source code) checks "coherence", which means that it
ensures that two impls of the same trait for the same type cannot exist.
Solver (chalk_solve)
Finally, when we've collected all the program clauses we care about, we want to perform queries on it. The component that finds the answer to these queries is called the solver.
See also: The SLG Solver
Crates
Chalk's functionality is broken up into the following crates:
- chalk_engine: Defines the core SLG solver.
- chalk_rust_ir, containing the "HIR-like" form of the AST
- chalk_ir: Defines chalk's internal representation of types, lifetimes, and goals.
- chalk_solve: Combines
chalk_ir
andchalk_engine
, effectively, which implements logic rules convertingchalk_rust_ir
tochalk_ir
- Defines the
coherence
module, which implements coherence rules chalk_engine::context
provides the necessary hooks.
- Defines the
- chalk_parse: Defines the raw AST and a parser.
- chalk: Brings everything together. Defines the following
modules:
chalk::lowering
, which converts AST tochalk_rust_ir
- Also includes chalki, chalk's REPL.
Testing
chalk has a test framework for lowering programs to logic, checking the lowered logic, and performing queries on it. This is how we test the implementation of chalk itself, and the viability of the lowering rules.
The main kind of tests in chalk are goal tests. They contain a program, which is expected to lower to logic successfully, and a set of queries (goals) along with the expected output. Here's an example. Since chalk's output can be quite long, goal tests support specifying only a prefix of the output.
Lowering tests check the stages that occur before we can issue queries to the solver: the lowering to chalk_rust_ir, and the well-formedness checks that occur after that.
Testing internals
Goal tests use a test!
macro that takes chalk's Rust-like
syntax and runs it through the full pipeline described above. The macro
ultimately calls the solve_goal
function.
Likewise, lowering tests use the lowering_success!
and
lowering_error!
macros.
More Resources
Blog Posts
- Lowering Rust traits to logic
- Unification in Chalk, part 1
- Unification in Chalk, part 2
- Negative reasoning in Chalk
- Query structure in chalk
- Cyclic queries in chalk
- An on-demand SLG solver for chalk
Bibliography
If you'd like to read more background material, here are some recommended texts and papers:
Programming with Higher-order Logic, by Dale Miller and Gopalan Nadathur, covers the key concepts of Lambda prolog. Although it's a slim little volume, it's the kind of book where you learn something new every time you open it.
"A proof procedure for the logic of Hereditary Harrop formulas", by Gopalan Nadathur. This paper covers the basics of universes, environments, and Lambda Prolog-style proof search. Quite readable.
"A new formulation of tabled resolution with delay", by Theresa Swift. This paper gives a kind of abstract treatment of the SLG formulation that is the basis for our on-demand solver.
Type checking
The rustc_typeck
crate contains the source for "type collection"
and "type checking", as well as a few other bits of related functionality. (It
draws heavily on the type inference and trait solving.)
Type collection
Type "collection" is the process of converting the types found in the HIR
(hir::Ty
), which represent the syntactic things that the user wrote, into the
internal representation used by the compiler (Ty<'tcx>
) – we also do
similar conversions for where-clauses and other bits of the function signature.
To try and get a sense for the difference, consider this function:
struct Foo { }
fn foo(x: Foo, y: self::Foo) { ... }
// ^^^ ^^^^^^^^^
Those two parameters x
and y
each have the same type: but they will have
distinct hir::Ty
nodes. Those nodes will have different spans, and of course
they encode the path somewhat differently. But once they are "collected" into
Ty<'tcx>
nodes, they will be represented by the exact same internal type.
Collection is defined as a bundle of queries for computing information about the various functions, traits, and other items in the crate being compiled. Note that each of these queries is concerned with interprocedural things – for example, for a function definition, collection will figure out the type and signature of the function, but it will not visit the body of the function in any way, nor examine type annotations on local variables (that's the job of type checking).
For more details, see the collect
module.
TODO: actually talk about type checking...
Method lookup
Method lookup can be rather complex due to the interaction of a number of factors, such as self types, autoderef, trait lookup, etc. This file provides an overview of the process. More detailed notes are in the code itself, naturally.
One way to think of method lookup is that we convert an expression of the form:
receiver.method(...)
into a more explicit UFCS form:
Trait::method(ADJ(receiver), ...) // for a trait call
ReceiverType::method(ADJ(receiver), ...) // for an inherent method call
Here ADJ
is some kind of adjustment, which is typically a series of
autoderefs and then possibly an autoref (e.g., &**receiver
). However
we sometimes do other adjustments and coercions along the way, in
particular unsizing (e.g., converting from [T; n]
to [T]
).
Method lookup is divided into two major phases:
- Probing (
probe.rs
). The probe phase is when we decide what method to call and how to adjust the receiver. - Confirmation (
confirm.rs
). The confirmation phase "applies" this selection, updating the side-tables, unifying type variables, and otherwise doing side-effectful things.
One reason for this division is to be more amenable to caching. The
probe phase produces a "pick" (probe::Pick
), which is designed to be
cacheable across method-call sites. Therefore, it does not include
inference variables or other information.
The Probe phase
Steps
The first thing that the probe phase does is to create a series of
steps. This is done by progressively dereferencing the receiver type
until it cannot be deref'd anymore, as well as applying an optional
"unsize" step. So if the receiver has type Rc<Box<[T; 3]>>
, this
might yield:
Rc<Box<[T; 3]>>
Box<[T; 3]>
[T; 3]
[T]
Candidate assembly
We then search along those steps to create a list of candidates. A
Candidate
is a method item that might plausibly be the method being
invoked. For each candidate, we'll derive a "transformed self type"
that takes into account explicit self.
Candidates are grouped into two kinds, inherent and extension.
Inherent candidates are those that are derived from the
type of the receiver itself. So, if you have a receiver of some
nominal type Foo
(e.g., a struct), any methods defined within an
impl like impl Foo
are inherent methods. Nothing needs to be
imported to use an inherent method, they are associated with the type
itself (note that inherent impls can only be defined in the same
module as the type itself).
FIXME: Inherent candidates are not always derived from impls. If you
have a trait object, such as a value of type Box<ToString>
, then the
trait methods (to_string()
, in this case) are inherently associated
with it. Another case is type parameters, in which case the methods of
their bounds are inherent. However, this part of the rules is subject
to change: when DST's "impl Trait for Trait" is complete, trait object
dispatch could be subsumed into trait matching, and the type parameter
behavior should be reconsidered in light of where clauses.
TODO: Is this FIXME still accurate?
Extension candidates are derived from imported traits. If I have
the trait ToString
imported, and I call to_string()
on a value of
type T
, then we will go off to find out whether there is an impl of
ToString
for T
. These kinds of method calls are called "extension
methods". They can be defined in any module, not only the one that
defined T
. Furthermore, you must import the trait to call such a
method.
So, let's continue our example. Imagine that we were calling a method
foo
with the receiver Rc<Box<[T; 3]>>
and there is a trait Foo
that defines it with &self
for the type Rc<U>
as well as a method
on the type Box
that defines Foo
but with &mut self
. Then we
might have two candidates:
&Rc<Box<[T; 3]>> from the impl of `Foo` for `Rc<U>` where `U=Box<T; 3]>
&mut Box<[T; 3]>> from the inherent impl on `Box<U>` where `U=[T; 3]`
Candidate search
Finally, to actually pick the method, we will search down the steps, trying to match the receiver type against the candidate types. At each step, we also consider an auto-ref and auto-mut-ref to see whether that makes any of the candidates match. We pick the first step where we find a match.
In the case of our example, the first step is Rc<Box<[T; 3]>>
,
which does not itself match any candidate. But when we autoref it, we
get the type &Rc<Box<[T; 3]>>
which does match. We would then
recursively consider all where-clauses that appear on the impl: if
those match (or we cannot rule out that they do), then this is the
method we would pick. Otherwise, we would continue down the series of
steps.
Variance of type and lifetime parameters
For a more general background on variance, see the background appendix.
During type checking we must infer the variance of type and lifetime parameters. The algorithm is taken from Section 4 of the paper "Taming the Wildcards: Combining Definition- and Use-Site Variance" published in PLDI'11 and written by Altidor et al., and hereafter referred to as The Paper.
This inference is explicitly designed not to consider the uses of
types within code. To determine the variance of type parameters
defined on type X
, we only consider the definition of the type X
and the definitions of any types it references.
We only infer variance for type parameters found on data types
like structs and enums. In these cases, there is a fairly straightforward
explanation for what variance means. The variance of the type
or lifetime parameters defines whether T<A>
is a subtype of T<B>
(resp. T<'a>
and T<'b>
) based on the relationship of A
and B
(resp. 'a
and 'b
).
We do not infer variance for type parameters found on traits, functions, or impls. Variance on trait parameters can indeed make sense (and we used to compute it) but it is actually rather subtle in meaning and not that useful in practice, so we removed it. See the addendum for some details. Variances on function/impl parameters, on the other hand, doesn't make sense because these parameters are instantiated and then forgotten, they don't persist in types or compiled byproducts.
Notation
We use the notation of The Paper throughout this chapter:
+
is covariance.-
is contravariance.*
is bivariance.o
is invariance.
The algorithm
The basic idea is quite straightforward. We iterate over the types
defined and, for each use of a type parameter X
, accumulate a
constraint indicating that the variance of X
must be valid for the
variance of that use site. We then iteratively refine the variance of
X
until all constraints are met. There is always a solution, because at
the limit we can declare all type parameters to be invariant and all
constraints will be satisfied.
As a simple example, consider:
enum Option<A> { Some(A), None }
enum OptionalFn<B> { Some(|B|), None }
enum OptionalMap<C> { Some(|C| -> C), None }
Here, we will generate the constraints:
1. V(A) <= +
2. V(B) <= -
3. V(C) <= +
4. V(C) <= -
These indicate that (1) the variance of A must be at most covariant; (2) the variance of B must be at most contravariant; and (3, 4) the variance of C must be at most covariant and contravariant. All of these results are based on a variance lattice defined as follows:
* Top (bivariant)
- +
o Bottom (invariant)
Based on this lattice, the solution V(A)=+
, V(B)=-
, V(C)=o
is the
optimal solution. Note that there is always a naive solution which
just declares all variables to be invariant.
You may be wondering why fixed-point iteration is required. The reason is that the variance of a use site may itself be a function of the variance of other type parameters. In full generality, our constraints take the form:
V(X) <= Term
Term := + | - | * | o | V(X) | Term x Term
Here the notation V(X)
indicates the variance of a type/region
parameter X
with respect to its defining class. Term x Term
represents the "variance transform" as defined in the paper:
If the variance of a type variable
X
in type expressionE
isV2
and the definition-site variance of the [corresponding] type parameter of a classC
isV1
, then the variance ofX
in the type expressionC<E>
isV3 = V1.xform(V2)
.
Constraints
If I have a struct or enum with where clauses:
struct Foo<T: Bar> { ... }
you might wonder whether the variance of T
with respect to Bar
affects the
variance T
with respect to Foo
. I claim no. The reason: assume that T
is
invariant with respect to Bar
but covariant with respect to Foo
. And then
we have a Foo<X>
that is upcast to Foo<Y>
, where X <: Y
. However, while
X : Bar
, Y : Bar
does not hold. In that case, the upcast will be illegal,
but not because of a variance failure, but rather because the target type
Foo<Y>
is itself just not well-formed. Basically we get to assume
well-formedness of all types involved before considering variance.
Dependency graph management
Because variance is a whole-crate inference, its dependency graph can become quite muddled if we are not careful. To resolve this, we refactor into two queries:
crate_variances
computes the variance for all items in the current crate.variances_of
accesses the variance for an individual reading; it works by requestingcrate_variances
and extracting the relevant data.
If you limit yourself to reading variances_of
, your code will only
depend then on the inference of that particular item.
Ultimately, this setup relies on the red-green algorithm. In particular,
every variance query effectively depends on all type definitions in the entire
crate (through crate_variances
), but since most changes will not result in a
change to the actual results from variance inference, the variances_of
query
will wind up being considered green after it is re-evaluated.
Addendum: Variance on traits
As mentioned above, we used to permit variance on traits. This was
computed based on the appearance of trait type parameters in
method signatures and was used to represent the compatibility of
vtables in trait objects (and also "virtual" vtables or dictionary
in trait bounds). One complication was that variance for
associated types is less obvious, since they can be projected out
and put to myriad uses, so it's not clear when it is safe to allow
X<A>::Bar
to vary (or indeed just what that means). Moreover (as
covered below) all inputs on any trait with an associated type had
to be invariant, limiting the applicability. Finally, the
annotations (MarkerTrait
, PhantomFn
) needed to ensure that all
trait type parameters had a variance were confusing and annoying
for little benefit.
Just for historical reference, I am going to preserve some text indicating how one could interpret variance and trait matching.
Variance and object types
Just as with structs and enums, we can decide the subtyping
relationship between two object types &Trait<A>
and &Trait<B>
based on the relationship of A
and B
. Note that for object
types we ignore the Self
type parameter – it is unknown, and
the nature of dynamic dispatch ensures that we will always call a
function that is expected the appropriate Self
type. However, we
must be careful with the other type parameters, or else we could
end up calling a function that is expecting one type but provided
another.
To see what I mean, consider a trait like so:
# #![allow(unused_variables)] #fn main() { trait ConvertTo<A> { fn convertTo(&self) -> A; } #}
Intuitively, If we had one object O=&ConvertTo<Object>
and another
S=&ConvertTo<String>
, then S <: O
because String <: Object
(presuming Java-like "string" and "object" types, my go to examples
for subtyping). The actual algorithm would be to compare the
(explicit) type parameters pairwise respecting their variance: here,
the type parameter A is covariant (it appears only in a return
position), and hence we require that String <: Object
.
You'll note though that we did not consider the binding for the
(implicit) Self
type parameter: in fact, it is unknown, so that's
good. The reason we can ignore that parameter is precisely because we
don't need to know its value until a call occurs, and at that time (as
you said) the dynamic nature of virtual dispatch means the code we run
will be correct for whatever value Self
happens to be bound to for
the particular object whose method we called. Self
is thus different
from A
, because the caller requires that A
be known in order to
know the return type of the method convertTo()
. (As an aside, we
have rules preventing methods where Self
appears outside of the
receiver position from being called via an object.)
Trait variance and vtable resolution
But traits aren't only used with objects. They're also used when deciding whether a given impl satisfies a given trait bound. To set the scene here, imagine I had a function:
fn convertAll<A,T:ConvertTo<A>>(v: &[T]) { ... }
Now imagine that I have an implementation of ConvertTo
for Object
:
impl ConvertTo<i32> for Object { ... }
And I want to call convertAll
on an array of strings. Suppose
further that for whatever reason I specifically supply the value of
String
for the type parameter T
:
let mut vector = vec!["string", ...];
convertAll::<i32, String>(vector);
Is this legal? To put another way, can we apply the impl
for
Object
to the type String
? The answer is yes, but to see why
we have to expand out what will happen:
-
convertAll
will create a pointer to one of the entries in the vector, which will have type&String
-
It will then call the impl of
convertTo()
that is intended for use with objects. This has the typefn(self: &Object) -> i32
.It is OK to provide a value for
self
of type&String
because&String <: &Object
.
OK, so intuitively we want this to be legal, so let's bring this back
to variance and see whether we are computing the correct result. We
must first figure out how to phrase the question "is an impl for
Object,i32
usable where an impl for String,i32
is expected?"
Maybe it's helpful to think of a dictionary-passing implementation of
type classes. In that case, convertAll()
takes an implicit parameter
representing the impl. In short, we have an impl of type:
V_O = ConvertTo<i32> for Object
and the function prototype expects an impl of type:
V_S = ConvertTo<i32> for String
As with any argument, this is legal if the type of the value given
(V_O
) is a subtype of the type expected (V_S
). So is V_O <: V_S
?
The answer will depend on the variance of the various parameters. In
this case, because the Self
parameter is contravariant and A
is
covariant, it means that:
V_O <: V_S iff
i32 <: i32
String <: Object
These conditions are satisfied and so we are happy.
Variance and associated types
Traits with associated types – or at minimum projection expressions – must be invariant with respect to all of their inputs. To see why this makes sense, consider what subtyping for a trait reference means:
<T as Trait> <: <U as Trait>
means that if I know that T as Trait
, I also know that U as Trait
. Moreover, if you think of it as dictionary passing style,
it means that a dictionary for <T as Trait>
is safe to use where
a dictionary for <U as Trait>
is expected.
The problem is that when you can project types out from <T as Trait>
, the relationship to types projected out of <U as Trait>
is completely unknown unless T==U
(see #21726 for more
details). Making Trait
invariant ensures that this is true.
Another related reason is that if we didn't make traits with associated types invariant, then projection is no longer a function with a single result. Consider:
trait Identity { type Out; fn foo(&self); }
impl<T> Identity for T { type Out = T; ... }
Now if I have <&'static () as Identity>::Out
, this can be
validly derived as &'a ()
for any 'a
:
<&'a () as Identity> <: <&'static () as Identity>
if &'static () < : &'a () -- Identity is contravariant in Self
if 'static : 'a -- Subtyping rules for relations
This change otoh means that <'static () as Identity>::Out
is
always &'static ()
(which might then be upcast to 'a ()
,
separately). This was helpful in solving #21750.
Existential Types
Existential types are essentially strong type aliases which only expose a specific set of traits as their interface and the concrete type in the background is inferred from a certain set of use sites of the existential type.
In the language they are expressed via
existential type Foo: Bar;
This is in existential type named Foo
which can be interacted with via
the Bar
trait's interface.
Since there needs to be a concrete background type, you can currently express that type by using the existential type in a "defining use site".
struct Struct;
impl Bar for Struct { /* stuff */ }
fn foo() -> Foo {
Struct
}
Any other "defining use site" needs to produce the exact same type.
Defining use site(s)
Currently only the return value of a function inside can be a defining use site of an existential type (and only if the return type of that function contains the existential type).
The defining use of an existential type can be any code within the parent of the existential type definition. This includes any siblings of the existential type and all children of the siblings.
The initiative for "not causing fatal brain damage to developers due to accidentally running infinite loops in their brain while trying to comprehend what the type system is doing" has decided to disallow children of existential types to be defining use sites.
Associated existential types
Associated existential types can be defined by any other associated item
on the same trait impl
or a child of these associated items.
The MIR (Mid-level IR)
MIR is Rust's Mid-level Intermediate Representation. It is constructed from HIR. MIR was introduced in RFC 1211. It is a radically simplified form of Rust that is used for certain flow-sensitive safety checks – notably the borrow checker! – and also for optimization and code generation.
If you'd like a very high-level introduction to MIR, as well as some of the compiler concepts that it relies on (such as control-flow graphs and desugaring), you may enjoy the rust-lang blog post that introduced MIR.
Introduction to MIR
MIR is defined in the src/librustc/mir/
module, but much of the code
that manipulates it is found in src/librustc_mir
.
Some of the key characteristics of MIR are:
- It is based on a control-flow graph.
- It does not have nested expressions.
- All types in MIR are fully explicit.
Key MIR vocabulary
This section introduces the key concepts of MIR, summarized here:
- Basic blocks: units of the control-flow graph, consisting of:
- statements: actions with one successor
- terminators: actions with potentially multiple successors; always at the end of a block
- (if you're not familiar with the term basic block, see the background chapter)
- Locals: Memory locations allocated on the stack (conceptually, at
least), such as function arguments, local variables, and
temporaries. These are identified by an index, written with a
leading underscore, like
_1
. There is also a special "local" (_0
) allocated to store the return value. - Places: expressions that identify a location in memory, like
_1
or_1.f
. - Rvalues: expressions that produce a value. The "R" stands for
the fact that these are the "right-hand side" of an assignment.
- Operands: the arguments to an rvalue, which can either be a
constant (like
22
) or a place (like_1
).
- Operands: the arguments to an rvalue, which can either be a
constant (like
You can get a feeling for how MIR is structed by translating simple programs into MIR and reading the pretty printed output. In fact, the playground makes this easy, since it supplies a MIR button that will show you the MIR for your program. Try putting this program into play (or clicking on this link), and then clicking the "MIR" button on the top:
fn main() { let mut vec = Vec::new(); vec.push(1); vec.push(2); }
You should see something like:
// WARNING: This output format is intended for human consumers only
// and is subject to change without notice. Knock yourself out.
fn main() -> () {
...
}
This is the MIR format for the main
function.
Variable declarations. If we drill in a bit, we'll see it begins with a bunch of variable declarations. They look like this:
let mut _0: (); // return place
scope 1 {
let mut _1: std::vec::Vec<i32>; // "vec" in scope 1 at src/main.rs:2:9: 2:16
}
scope 2 {
}
let mut _2: ();
let mut _3: &mut std::vec::Vec<i32>;
let mut _4: ();
let mut _5: &mut std::vec::Vec<i32>;
You can see that variables in MIR don't have names, they have indices,
like _0
or _1
. We also intermingle the user's variables (e.g.,
_1
) with temporary values (e.g., _2
or _3
). You can tell the
difference between user-defined variables have a comment that gives
you their original name (// "vec" in scope 1...
). The "scope" blocks
(e.g., scope 1 { .. }
) describe the lexical structure of the source
program (which names were in scope when).
Basic blocks. Reading further, we see our first basic block (naturally it may look slightly different when you view it, and I am ignoring some of the comments):
bb0: {
StorageLive(_1);
_1 = const <std::vec::Vec<T>>::new() -> bb2;
}
A basic block is defined by a series of statements and a final terminator. In this case, there is one statement:
StorageLive(_1);
This statement indicates that the variable _1
is "live", meaning
that it may be used later – this will persist until we encounter a
StorageDead(_1)
statement, which indicates that the variable _1
is
done being used. These "storage statements" are used by LLVM to
allocate stack space.
The terminator of the block bb0
is the call to Vec::new
:
_1 = const <std::vec::Vec<T>>::new() -> bb2;
Terminators are different from statements because they can have more
than one successor – that is, control may flow to different
places. Function calls like the call to Vec::new
are always
terminators because of the possibility of unwinding, although in the
case of Vec::new
we are able to see that indeed unwinding is not
possible, and hence we list only one succssor block, bb2
.
If we look ahead to bb2
, we will see it looks like this:
bb2: {
StorageLive(_3);
_3 = &mut _1;
_2 = const <std::vec::Vec<T>>::push(move _3, const 1i32) -> [return: bb3, unwind: bb4];
}
Here there are two statements: another StorageLive
, introducing the _3
temporary, and then an assignment:
_3 = &mut _1;
Assignments in general have the form:
<Place> = <Rvalue>
A place is an expression like _3
, _3.f
or *_3
– it denotes a
location in memory. An Rvalue is an expression that creates a
value: in this case, the rvalue is a mutable borrow expression, which
looks like &mut <Place>
. So we can kind of define a grammar for
rvalues like so:
<Rvalue> = & (mut)? <Place>
| <Operand> + <Operand>
| <Operand> - <Operand>
| ...
<Operand> = Constant
| copy Place
| move Place
As you can see from this grammar, rvalues cannot be nested – they can
only reference places and constants. Moreover, when you use a place,
we indicate whether we are copying it (which requires that the
place have a type T
where T: Copy
) or moving it (which works
for a place of any type). So, for example, if we had the expression x = a + b + c
in Rust, that would get compile to two statements and a
temporary:
TMP1 = a + b
x = TMP1 + c
(Try it and see, though you may want to do release mode to skip over the overflow checks.)
MIR data types
The MIR data types are defined in the src/librustc/mir/
module. Each of the key concepts mentioned in the previous section
maps in a fairly straightforward way to a Rust type.
The main MIR data type is Mir
. It contains the data for a single
function (along with sub-instances of Mir for "promoted constants",
but you can read about those below).
- Basic blocks: The basic blocks are stored in the field
basic_blocks
; this is a vector ofBasicBlockData
structures. Nobody ever references a basic block directly: instead, we pass aroundBasicBlock
values, which are newtype'd indices into this vector. - Statements are represented by the type
Statement
. - Terminators are represented by the
Terminator
. - Locals are represented by a newtype'd index type
Local
. The data for a local variable is found in theMir
(thelocal_decls
vector). There is also a special constantRETURN_PLACE
identifying the special "local" representing the return value. - Places are identified by the enum
Place
. There are a few variants:- Local variables like
_1
- Static variables
FOO
- Projections, which are fields or other things that "project
out" from a base place. So e.g. the place
_1.f
is a projection, withf
being the "projection element and_1
being the base path.*_1
is also a projection, with the*
being represented by theProjectionElem::Deref
element.
- Local variables like
- Rvalues are represented by the enum
Rvalue
. - Operands are represented by the enum
Operand
.
Representing constants
to be written
Promoted constants
to be written
MIR construction
The lowering of HIR to MIR occurs for the following (probably incomplete) list of items:
- Function and Closure bodies
- Initializers of
static
andconst
items - Initializers of enum discriminants
- Glue and Shims of any kind
- Tuple struct initializer functions
- Drop code (the
Drop::drop
function is not called directly) - Drop implementations of types without an explicit
Drop
implementation
The lowering is triggered by calling the mir_built
query.
There is an intermediate representation
between HIR and MIR called the HAIR that is only used during the lowering.
The HAIR's most important feature is that the various adjustments (which happen
without explicit syntax) like coercions, autoderef, autoref and overloaded method
calls have become explicit casts, deref operations, reference expressions or
concrete function calls.
The HAIR has datatypes that mirror the HIR datatypes, but instead of e.g. -x
being a hair::ExprKind::Neg(hair::Expr)
it is a hair::ExprKind::Neg(hir::Expr)
.
This shallowness enables the HAIR
to represent all datatypes that HIR has, but
without having to create an in-memory copy of the entire HIR.
MIR lowering will first convert the topmost expression from
HIR to HAIR (in rustc_mir::hair::cx::expr) and then process
the HAIR expressions recursively.
The lowering creates local variables for every argument as specified in the signature.
Next it creates local variables for every binding specified (e.g. (a, b): (i32, String)
)
produces 3 bindings, one for the argument, and two for the bindings. Next it generates
field accesses that read the fields from the argument and writes the value to the binding
variable.
With this initialization out of the way, the lowering triggers a recursive call
to a function that generates the MIR for the body (a Block
expression) and
writes the result into the RETURN_PLACE
.
unpack!
all the things
Functions that generate MIR tend to fall into one of two patterns. First, if the function generates only statements, then it will take a basic block as argument onto which those statements should be appended. It can then return a result as normal:
fn generate_some_mir(&mut self, block: BasicBlock) -> ResultType {
...
}
But there are other functions that may generate new basic blocks as well.
For example, lowering an expression like if foo { 22 } else { 44 }
requires generating a small "diamond-shaped graph".
In this case, the functions take a basic block where their code starts
and return a (potentially) new basic block where the code generation ends.
The BlockAnd
type is used to represent this:
fn generate_more_mir(&mut self, block: BasicBlock) -> BlockAnd<ResultType> {
...
}
When you invoke these functions, it is common to have a local variable block
that is effectively a "cursor". It represents the point at which we are adding new MIR.
When you invoke generate_more_mir
, you want to update this cursor.
You can do this manually, but it's tedious:
let mut block;
let v = match self.generate_more_mir(..) {
BlockAnd { block: new_block, value: v } => {
block = new_block;
v
}
};
For this reason, we offer a macro that lets you write
let v = unpack!(block = self.generate_more_mir(...))
.
It simply extracts the new block and overwrites the
variable block
that you named in the unpack!
.
Lowering expressions into the desired MIR
There are essentially four kinds of representations one might want of an expression:
Place
refers to a (or part of a) preexisting memory location (local, static, promoted)Rvalue
is something that can be assigned to aPlace
Operand
is an argument to e.g. a+
operation or a function call- a temporary variable containing a copy of the value
These following image depicts a general overview of the interactions between the representations:
Click here for a more detailed view
We start out with lowering the function body to an Rvalue
so we can create an
assignment to RETURN_PLACE
, This Rvalue
lowering will in turn trigger lowering to
Operand
for its arguments (if any). Operand
lowering either produces a const
operand, or moves/copies out of a Place
, thus triggering a Place
lowering. An
expression being lowered to a Place
can in turn trigger a temporary to be created
if the expression being lowered contains operations. This is where the snake bites its
own tail and we need to trigger an Rvalue
lowering for the expression to be written
into the local.
Operator lowering
Operators on builtin types are not lowered to function calls (which would end up being
infinite recursion calls, because the trait impls just contain the operation itself
again). Instead there are Rvalue
s for binary and unary operators and index operations.
These Rvalue
s later get codegened to llvm primitive operations or llvm intrinsics.
Operators on all other types get lowered to a function call to their impl
of the
operator's corresponding trait.
Regardless of the lowering kind, the arguments to the operator are lowered to Operand
s.
This means all arguments are either constants, or refer to an already existing value
somewhere in a local or static.
Method call lowering
Method calls are lowered to the same TerminatorKind
that function calls are.
In MIR there is no difference between method calls and function calls anymore.
Conditions
if
conditions and match
statements for enum
s without variants with fields are
lowered to TerminatorKind::SwitchInt
. Each possible value (so 0
and 1
for if
conditions) has a corresponding BasicBlock
to which the code continues.
The argument being branched on is (again) an Operand
representing the value of
the if condition.
Pattern matching
match
statements for enum
s with variants that have fields are lowered to
TerminatorKind::SwitchInt
, too, but the Operand
refers to a Place
where the
discriminant of the value can be found. This often involves reading the discriminant
to a new temporary variable.
Aggregate construction
Aggregate values of any kind (e.g. structs or tuples) are built via Rvalue::Aggregate
.
All fields are
lowered to Operator
s. This is essentially equivalent to one assignment
statement per aggregate field plus an assignment to the discriminant in the
case of enum
s.
MIR visitor
The MIR visitor is a convenient tool for traversing the MIR and either
looking for things or making changes to it. The visitor traits are
defined in the rustc::mir::visit
module – there are two of
them, generated via a single macro: Visitor
(which operates on a
&Mir
and gives back shared references) and MutVisitor
(which
operates on a &mut Mir
and gives back mutable references).
To implement a visitor, you have to create a type that represents your visitor. Typically, this type wants to "hang on" to whatever state you will need while processing MIR:
struct MyVisitor<...> {
tcx: TyCtxt<'cx, 'tcx, 'tcx>,
...
}
and you then implement the Visitor
or MutVisitor
trait for that type:
impl<'tcx> MutVisitor<'tcx> for NoLandingPads {
fn visit_foo(&mut self, ...) {
...
self.super_foo(...);
}
}
As shown above, within the impl, you can override any of the
visit_foo
methods (e.g., visit_terminator
) in order to write some
code that will execute whenever a foo
is found. If you want to
recursively walk the contents of the foo
, you then invoke the
super_foo
method. (NB. You never want to override super_foo
.)
A very simple example of a visitor can be found in NoLandingPads
.
That visitor doesn't even require any state: it just visits all
terminators and removes their unwind
successors.
Traversal
In addition the visitor, the rustc::mir::traversal
module
contains useful functions for walking the MIR CFG in
different standard orders (e.g. pre-order, reverse
post-order, and so forth).
MIR passes
If you would like to get the MIR for a function (or constant, etc),
you can use the optimized_mir(def_id)
query. This will give you back
the final, optimized MIR. For foreign def-ids, we simply read the MIR
from the other crate's metadata. But for local def-ids, the query will
construct the MIR and then iteratively optimize it by applying a
series of passes. This section describes how those passes work and how
you can extend them.
To produce the optimized_mir(D)
for a given def-id D
, the MIR
passes through several suites of optimizations, each represented by a
query. Each suite consists of multiple optimizations and
transformations. These suites represent useful intermediate points
where we want to access the MIR for type checking or other purposes:
mir_build(D)
– not a query, but this constructs the initial MIRmir_const(D)
– applies some simple transformations to make MIR ready for constant evaluation;mir_validated(D)
– applies some more transformations, making MIR ready for borrow checking;optimized_mir(D)
– the final state, after all optimizations have been performed.
Implementing and registering a pass
A MirPass
is some bit of code that processes the MIR, typically –
but not always – transforming it along the way somehow. For example,
it might perform an optimization. The MirPass
trait itself is found
in in the rustc_mir::transform
module, and it
basically consists of one method, run_pass
, that simply gets an
&mut Mir
(along with the tcx and some information about where it
came from). The MIR is therefore modified in place (which helps to
keep things efficient).
A good example of a basic MIR pass is NoLandingPads
, which walks
the MIR and removes all edges that are due to unwinding – this is
used when configured with panic=abort
, which never unwinds. As you
can see from its source, a MIR pass is defined by first defining a
dummy type, a struct with no fields, something like:
# #![allow(unused_variables)] #fn main() { struct MyPass; #}
for which you then implement the MirPass
trait. You can then insert
this pass into the appropriate list of passes found in a query like
optimized_mir
, mir_validated
, etc. (If this is an optimization, it
should go into the optimized_mir
list.)
If you are writing a pass, there's a good chance that you are going to want to use a MIR visitor. MIR visitors are a handy way to walk all the parts of the MIR, either to search for something or to make small edits.
Stealing
The intermediate queries mir_const()
and mir_validated()
yield up
a &'tcx Steal<Mir<'tcx>>
, allocated using
tcx.alloc_steal_mir()
. This indicates that the result may be
stolen by the next suite of optimizations – this is an
optimization to avoid cloning the MIR. Attempting to use a stolen
result will cause a panic in the compiler. Therefore, it is important
that you do not read directly from these intermediate queries except as
part of the MIR processing pipeline.
Because of this stealing mechanism, some care must also be taken to
ensure that, before the MIR at a particular phase in the processing
pipeline is stolen, anyone who may want to read from it has already
done so. Concretely, this means that if you have some query foo(D)
that wants to access the result of mir_const(D)
or
mir_validated(D)
, you need to have the successor pass "force"
foo(D)
using ty::queries::foo::force(...)
. This will force a query
to execute even though you don't directly require its result.
As an example, consider MIR const qualification. It wants to read the
result produced by the mir_const()
suite. However, that result will
be stolen by the mir_validated()
suite. If nothing was done,
then mir_const_qualif(D)
would succeed if it came before
mir_validated(D)
, but fail otherwise. Therefore, mir_validated(D)
will force mir_const_qualif
before it actually steals, thus
ensuring that the reads have already happened (remember that
queries are memoized, so executing a query twice
simply loads from a cache the second time):
mir_const(D) --read-by--> mir_const_qualif(D)
| ^
stolen-by |
| (forces)
v |
mir_validated(D) ------------+
This mechanism is a bit dodgy. There is a discussion of more elegant alternatives in rust-lang/rust#41710.
MIR optimizations
MIR Debugging
The -Zdump-mir
flag can be used to dump a text representation of the MIR. The
-Zdump-mir-graphviz
flag can be used to dump a .dot
file that represents
MIR as a control-flow graph.
-Zdump-mir=F
is a handy compiler options that will let you view the MIR for
each function at each stage of compilation. -Zdump-mir
takes a filter F
which allows you to control which functions and which passes you are
interesting in. For example:
> rustc -Zdump-mir=foo ...
This will dump the MIR for any function whose name contains foo
; it
will dump the MIR both before and after every pass. Those files will
be created in the mir_dump
directory. There will likely be quite a
lot of them!
> cat > foo.rs
fn main() {
println!("Hello, world!");
}
^D
> rustc -Zdump-mir=main foo.rs
> ls mir_dump/* | wc -l
161
The files have names like rustc.main.000-000.CleanEndRegions.after.mir
. These
names have a number of parts:
rustc.main.000-000.CleanEndRegions.after.mir
---- --- --- --------------- ----- either before or after
| | | name of the pass
| | index of dump within the pass (usually 0, but some passes dump intermediate states)
| index of the pass
def-path to the function etc being dumped
You can also make more selective filters. For example, main & CleanEndRegions
will select for things that reference both main
and the pass
CleanEndRegions
:
> rustc -Zdump-mir='main & CleanEndRegions' foo.rs
> ls mir_dump
rustc.main.000-000.CleanEndRegions.after.mir rustc.main.000-000.CleanEndRegions.before.mir
Filters can also have |
parts to combine multiple sets of
&
-filters. For example main & CleanEndRegions | main & NoLandingPads
will select either main
and CleanEndRegions
or
main
and NoLandingPads
:
> rustc -Zdump-mir='main & CleanEndRegions | main & NoLandingPads' foo.rs
> ls mir_dump
rustc.main-promoted[0].002-000.NoLandingPads.after.mir
rustc.main-promoted[0].002-000.NoLandingPads.before.mir
rustc.main-promoted[0].002-006.NoLandingPads.after.mir
rustc.main-promoted[0].002-006.NoLandingPads.before.mir
rustc.main-promoted[1].002-000.NoLandingPads.after.mir
rustc.main-promoted[1].002-000.NoLandingPads.before.mir
rustc.main-promoted[1].002-006.NoLandingPads.after.mir
rustc.main-promoted[1].002-006.NoLandingPads.before.mir
rustc.main.000-000.CleanEndRegions.after.mir
rustc.main.000-000.CleanEndRegions.before.mir
rustc.main.002-000.NoLandingPads.after.mir
rustc.main.002-000.NoLandingPads.before.mir
rustc.main.002-006.NoLandingPads.after.mir
rustc.main.002-006.NoLandingPads.before.mir
(Here, the main-promoted[0]
files refer to the MIR for "promoted constants"
that appeared within the main
function.)
TODO: anything else?
MIR borrow check
The borrow check is Rust's "secret sauce" – it is tasked with enforcing a number of properties:
- That all variables are initialized before they are used.
- That you can't move the same value twice.
- That you can't move a value while it is borrowed.
- That you can't access a place while it is mutably borrowed (except through the reference).
- That you can't mutate a place while it is shared borrowed.
- etc
At the time of this writing, the code is in a state of transition. The "main" borrow checker still works by processing the HIR, but that is being phased out in favor of the MIR-based borrow checker. Accordingly, this documentation focuses on the new, MIR-based borrow checker.
Doing borrow checking on MIR has several advantages:
- The MIR is far less complex than the HIR; the radical desugaring helps prevent bugs in the borrow checker. (If you're curious, you can see a list of bugs that the MIR-based borrow checker fixes here.)
- Even more importantly, using the MIR enables "non-lexical lifetimes", which are regions derived from the control-flow graph.
Major phases of the borrow checker
The borrow checker source is found in
the rustc_mir::borrow_check
module. The main entry point is
the mir_borrowck
query.
- We first create a local copy of the MIR. In the coming steps, we will modify this copy in place to modify the types and things to include references to the new regions that we are computing.
- We then invoke
replace_regions_in_mir
to modify our local MIR. Among other things, this function will replace all of the regions in the MIR with fresh inference variables. - Next, we perform a number of dataflow analyses that compute what data is moved and when.
- We then do a second type check across the MIR: the purpose of this type check is to determine all of the constraints between different regions.
- Next, we do region inference, which computes the values of each region — basically, the points in the control-flow graph where each lifetime must be valid according to the constraints we collected.
- At this point, we can compute the "borrows in scope" at each point.
- Finally, we do a second walk over the MIR, looking at the actions it
does and reporting errors. For example, if we see a statement like
*a + 1
, then we would check that the variablea
is initialized and that it is not mutably borrowed, as either of those would require an error to be reported. Doing this check requires the results of all the previous analyses.
Tracking moves and initialization
Part of the borrow checker's job is to track which variables are "initialized" at any given point in time -- this also requires figuring out where moves occur and tracking those.
Initialization and moves
From a user's perspective, initialization -- giving a variable some value -- and moves -- transferring ownership to another place -- might seem like distinct topics. Indeed, our borrow checker error messages often talk about them differently. But within the borrow checker, they are not nearly as separate. Roughly speaking, the borrow checker tracks the set of "initialized places" at any point in the source code. Assigning to a previously uninitialized local variable adds it to that set; moving from a local variable removes it from that set.
Consider this example:
fn foo() {
let a: Vec<u32>;
// a is not initialized yet
a = vec![22];
// a is initialized here
std::mem::drop(a); // a is moved here
// a is no longer initialized here
let l = a.len(); //~ ERROR
}
Here you can see that a
starts off as uninitialized; once it is
assigned, it becomes initialized. But when drop(a)
is called, that
moves a
into the call, and hence it becomes uninitialized again.
Subsections
To make it easier to peruse, this section is broken into a number of subsections:
- Move paths the move path concept that we use to track which local variables (or parts of local variables, in some cases) are initialized.
- TODO Rest not yet written =)
Move paths
In reality, it's not enough to track initialization at the granularity of local variables. Rust also allows us to do moves and initialization at the field granularity:
fn foo() {
let a: (Vec<u32>, Vec<u32>) = (vec![22], vec![44]);
// a.0 and a.1 are both initialized
let b = a.0; // moves a.0
// a.0 is not initializd, but a.1 still is
let c = a.0; // ERROR
let d = a.1; // OK
}
To handle this, we track initialization at the granularity of a move
path. A MovePath
represents some location that the user can
initialize, move, etc. So e.g. there is a move-path representing the
local variable a
, and there is a move-path representing a.0
. Move
paths roughly correspond to the concept of a Place
from MIR, but
they are indexed in ways that enable us to do move analysis more
efficiently.
Move path indices
Although there is a MovePath
data structure, they are never referenced
directly. Instead, all the code passes around indices of type
MovePathIndex
. If you need to get information about a move path, you use
this index with the move_paths
field of the MoveData
. For
example, to convert a MovePathIndex
mpi
into a MIR Place
, you might
access the MovePath::place
field like so:
move_data.move_paths[mpi].place
Building move paths
One of the first things we do in the MIR borrow check is to construct
the set of move paths. This is done as part of the
MoveData::gather_moves
function. This function uses a MIR visitor
called Gatherer
to walk the MIR and look at how each Place
within is accessed. For each such Place
, it constructs a
corresponding MovePathIndex
. It also records when/where that
particular move path is moved/initialized, but we'll get to that in a
later section.
Illegal move paths
We don't actually create a move-path for every Place
that gets
used. In particular, if it is illegal to move from a Place
, then
there is no need for a MovePathIndex
. Some examples:
- You cannot move from a static variable, so we do not create a
MovePathIndex
for static variables. - You cannot move an individual element of an array, so if we have e.g.
foo: [String; 3]
, there would be no move-path forfoo[1]
. - You cannot move from inside of a borrowed reference, so if we have e.g.
foo: &String
, there would be no move-path for*foo
.
These rules are enforced by the move_path_for
function, which
converts a Place
into a MovePathIndex
-- in error cases like
those just discussed, the function returns an Err
. This in turn
means we don't have to bother tracking whether those places are
initialized (which lowers overhead).
Looking up a move-path
If you have a Place
and you would like to convert it to a MovePathIndex
, you
can do that using the MovePathLookup
structure found in the rev_lookup
field
of [MoveData
]. There are two different methods:
find_local
, which takes amir::Local
representing a local variable. This is the easier method, because we always create aMovePathIndex
for every local variable.find
, which takes an arbitraryPlace
. This method is a bit more annoying to use, precisely because we don't have aMovePathIndex
for everyPlace
(as we just discussed in the "illegal move paths" section). Therefore,find
returns aLookupResult
indicating the closest path it was able to find that exists (e.g., forfoo[1]
, it might return just the path forfoo
).
Cross-references
As we noted above, move-paths are stored in a big vector and
referenced via their MovePathIndex
. However, within this vector,
they are also structured into a tree. So for example if you have the
MovePathIndex
for a.b.c
, you can go to its parent move-path
a.b
. You can also iterate over all children paths: so, from a.b
,
you might iterate to find the path a.b.c
(here you are iterating
just over the paths that are actually referenced in the source,
not all possible paths that could have been referenced). These
references are used for example in the has_any_child_of
function,
which checks whether the dataflow results contain a value for the
given move-path (e.g., a.b
) or any child of that move-path (e.g.,
a.b.c
).
The MIR type-check
A key component of the borrow check is the MIR type-check. This check walks the MIR and does a complete "type check" -- the same kind you might find in any other language. In the process of doing this type-check, we also uncover the region constraints that apply to the program.
TODO -- elaborate further? Maybe? :)
Region inference (NLL)
The MIR-based region checking code is located in
the rustc_mir::borrow_check::nll
module. (NLL, of course,
stands for "non-lexical lifetimes", a term that will hopefully be
deprecated once they become the standard kind of lifetime.)
The MIR-based region analysis consists of two major functions:
replace_regions_in_mir
, invoked first, has two jobs:- First, it finds the set of regions that appear within the
signature of the function (e.g.,
'a
infn foo<'a>(&'a u32) { ... }
). These are called the "universal" or "free" regions – in particular, they are the regions that appear free in the function body. - Second, it replaces all the regions from the function body with fresh inference variables. This is because (presently) those regions are the results of lexical region inference and hence are not of much interest. The intention is that – eventually – they will be "erased regions" (i.e., no information at all), since we won't be doing lexical region inference at all.
- First, it finds the set of regions that appear within the
signature of the function (e.g.,
compute_regions
, invoked second: this is given as argument the results of move analysis. It has the job of computing values for all the inference variables thatreplace_regions_in_mir
introduced.- To do that, it first runs the MIR type checker. This is basically a normal type-checker but specialized to MIR, which is much simpler than full Rust, of course. Running the MIR type checker will however create outlives constraints between region variables (e.g., that one variable must outlive another one) to reflect the subtyping relationships that arise.
- It also adds liveness constraints that arise from where variables are used.
- After this, we create a
RegionInferenceContext
with the constraints we have computed and the inference variables we introduced and use thesolve
method to infer values for all region inference varaibles. - The NLL RFC also includes fairly thorough (and hopefully readable) coverage.
Universal regions
The [UnversalRegions
] type represents a collection of universal regions
corresponding to some MIR DefId
. It is constructed in
replace_regions_in_mir
when we replace all regions with fresh inference
variables. UniversalRegions
contains indices for all the free regions in
the given MIR along with any relationships that are known to hold between
them (e.g. implied bounds, where clauses, etc.).
For example, given the MIR for the following function:
# #![allow(unused_variables)] #fn main() { fn foo<'a>(x: &'a u32) { // ... } #}
we would create a universal region for 'a
and one for 'static
. There may
also be some complications for handling closures, but we will ignore those for
the moment.
TODO: write about how these regions are computed.
Region variables
The value of a region can be thought of as a set. This set contains all
points in the MIR where the region is valid along with any regions that are
outlived by this region (e.g. if 'a: 'b
, then end('b)
is in the set for
'a
); we call the domain of this set a RegionElement
. In the code, the value
for all regions is maintained in the
rustc_mir::borrow_check::nll::region_infer
module. For each region we
maintain a set storing what elements are present in its value (to make this
efficient, we give each kind of element an index, the RegionElementIndex
, and
use sparse bitsets).
The kinds of region elements are as follows:
- Each location in the MIR control-flow graph: a location is just
the pair of a basic block and an index. This identifies the point
on entry to the statement with that index (or the terminator, if
the index is equal to
statements.len()
). - There is an element
end('a)
for each universal region'a
, corresponding to some portion of the caller's (or caller's caller, etc) control-flow graph. - Similarly, there is an element denoted
end('static)
corresponding to the remainder of program execution after this function returns. - There is an element
!1
for each placeholder region!1
. This corresponds (intuitively) to some unknown set of other elements – for details on placeholders, see the section placeholders and universes.
Constraints
Before we can infer the value of regions, we need to collect constraints on the regions. There are two primary types of constraints.
- Outlives constraints. These are constraints that one region outlives another
(e.g.
'a: 'b
). Outlives constraints are generated by the MIR type checker. - Liveness constraints. Each region needs to be live at points where it can be
used. These constraints are collected by
generate_constraints
.
Inference Overview
So how do we compute the contents of a region? This process is called region inference. The high-level idea is pretty simple, but there are some details we need to take care of.
Here is the high-level idea: we start off each region with the MIR locations we
know must be in it from the liveness constraints. From there, we use all of the
outlives constraints computed from the type checker to propagate the
constraints: for each region 'a
, if 'a: 'b
, then we add all elements of
'b
to 'a
, including end('b)
. This all happens in
propagate_constraints
.
Then, we will check for errors. We first check that type tests are satisfied by
calling check_type_tests
. This checks constraints like T: 'a
. Second, we
check that universal regions are not "too big". This is done by calling
check_universal_regions
. This checks that for each region 'a
if 'a
contains the element end('b)
, then we must already know that 'a: 'b
holds
(e.g. from a where clause). If we don't already know this, that is an error...
well, almost. There is some special handling for closures that we will discuss
later.
Example
Consider the following example:
fn foo<'a, 'b>(x: &'a usize) -> &'b usize {
x
}
Clearly, this should not compile because we don't know if 'a
outlives 'b
(if it doesn't then the return value could be a dangling reference).
Let's back up a bit. We need to introduce some free inference variables (as is
done in replace_regions_in_mir
). This example doesn't use the exact regions
produced, but it (hopefully) is enough to get the idea across.
fn foo<'a, 'b>(x: &'a /* '#1 */ usize) -> &'b /* '#3 */ usize {
x // '#2, location L1
}
Some notation: '#1
, '#3
, and '#2
represent the universal regions for the
argument, return value, and the expression x
, respectively. Additionally, I
will call the location of the expression x
L1
.
So now we can use the liveness constraints to get the following starting points:
Region | Contents |
---|---|
'#1 | |
'#2 | L1 |
'#3 | L1 |
Now we use the outlives constraints to expand each region. Specifically, we
know that '#2: '#3
...
Region | Contents |
---|---|
'#1 | L1 |
'#2 | L1, end('#3) // add contents of '#3 and end('#3) |
'#3 | L1 |
... and '#1: '#2
, so ...
Region | Contents |
---|---|
'#1 | L1, end('#2), end('#3) // add contents of '#2 and end('#2) |
'#2 | L1, end('#3) |
'#3 | L1 |
Now, we need to check that no regions were too big (we don't have any type
tests to check in this case). Notice that '#1
now contains end('#3)
, but
we have no where
clause or implied bound to say that 'a: 'b
... that's an
error!
Some details
The RegionInferenceContext
type contains all of the information needed to
do inference, including the universal regions from replace_regions_in_mir
and
the constraints computed for each region. It is constructed just after we
compute the liveness constraints.
Here are some of the fields of the struct:
constraints
: contains all the outlives constraints.liveness_constraints
: contains all the liveness constraints.universal_regions
: contains theUniversalRegions
returned byreplace_regions_in_mir
.universal_region_relations
: contains relations known to be true about universal regions. For example, if we have a where clause that'a: 'b
, that relation is assumed to be true while borrow checking the implementation (it is checked at the caller), souniversal_region_relations
would contain'a: 'b
.type_tests
: contains some constraints on types that we must check after inference (e.g.T: 'a
).closure_bounds_mapping
: used for propagating region constraints from closures back out to the creater of the closure.
TODO: should we discuss any of the others fields? What about the SCCs?
Ok, now that we have constructed a RegionInferenceContext
, we can do
inference. This is done by calling the solve
method on the context. This
is where we call propagate_constraints
and then check the resulting type
tests and universal regions, as discussed above.
Closures
When we are checking the type tests and universal regions, we may come across a constraint that we can't prove yet if we are in a closure body! However, the necessary constraints may actually hold (we just don't know it yet). Thus, if we are inside a closure, we just collect all the constraints we can't prove yet and return them. Later, when we are borrow check the MIR node that created the closure, we can also check that these constraints hold. At that time, if we can't prove they hold, we report an error.
Placeholders and universes
(This section describes ongoing work that hasn't landed yet.)
From time to time we have to reason about regions that we can't concretely know. For example, consider this program:
// A function that needs a static reference
fn foo(x: &'static u32) { }
fn bar(f: for<'a> fn(&'a u32)) {
// ^^^^^^^^^^^^^^^^^^^ a function that can accept **any** reference
let x = 22;
f(&x);
}
fn main() {
bar(foo);
}
This program ought not to type-check: foo
needs a static reference
for its argument, and bar
wants to be given a function that that
accepts any reference (so it can call it with something on its
stack, for example). But how do we reject it and why?
Subtyping and Placeholders
When we type-check main
, and in particular the call bar(foo)
, we
are going to wind up with a subtyping relationship like this one:
fn(&'static u32) <: for<'a> fn(&'a u32)
---------------- -------------------
the type of `foo` the type `bar` expects
We handle this sort of subtyping by taking the variables that are
bound in the supertype and replacing them with
universally quantified
representatives, written like !1
. We call these regions "placeholder
regions" – they represent, basically, "some unknown region".
Once we've done that replacement, we have the following relation:
fn(&'static u32) <: fn(&'!1 u32)
The key idea here is that this unknown region '!1
is not related to
any other regions. So if we can prove that the subtyping relationship
is true for '!1
, then it ought to be true for any region, which is
what we wanted.
So let's work through what happens next. To check if two functions are subtypes, we check if their arguments have the desired relationship (fn arguments are contravariant, so we swap the left and right here):
&'!1 u32 <: &'static u32
According to the basic subtyping rules for a reference, this will be
true if '!1: 'static
. That is – if "some unknown region !1
" lives
outlives 'static
. Now, this might be true – after all, '!1
could be 'static
– but we don't know that it's true. So this
should yield up an error (eventually).
What is a universe
In the previous section, we introduced the idea of a placeholder
region, and we denoted it !1
. We call this number 1
the universe
index. The idea of a "universe" is that it is a set of names that
are in scope within some type or at some point. Universes are formed
into a tree, where each child extends its parents with some new names.
So the root universe conceptually contains global names, such as
the the lifetime 'static
or the type i32
. In the compiler, we also
put generic type parameters into this root universe (in this sense,
there is not just one root universe, but one per item). So consider
this function bar
:
struct Foo { }
fn bar<'a, T>(t: &'a T) {
...
}
Here, the root universe would consist of the lifetimes 'static
and
'a
. In fact, although we're focused on lifetimes here, we can apply
the same concept to types, in which case the types Foo
and T
would
be in the root universe (along with other global types, like i32
).
Basically, the root universe contains all the names that
appear free in the body of bar
.
Now let's extend bar
a bit by adding a variable x
:
fn bar<'a, T>(t: &'a T) {
let x: for<'b> fn(&'b u32) = ...;
}
Here, the name 'b
is not part of the root universe. Instead, when we
"enter" into this for<'b>
(e.g., by replacing it with a placeholder), we will create
a child universe of the root, let's call it U1:
U0 (root universe)
│
└─ U1 (child universe)
The idea is that this child universe U1 extends the root universe U0
with a new name, which we are identifying by its universe number:
!1
.
Now let's extend bar
a bit by adding one more variable, y
:
fn bar<'a, T>(t: &'a T) {
let x: for<'b> fn(&'b u32) = ...;
let y: for<'c> fn(&'b u32) = ...;
}
When we enter this type, we will again create a new universe, which
we'll call U2
. Its parent will be the root universe, and U1 will be
its sibling:
U0 (root universe)
│
├─ U1 (child universe)
│
└─ U2 (child universe)
This implies that, while in U2, we can name things from U0 or U2, but not U1.
Giving existential variables a universe. Now that we have this
notion of universes, we can use it to extend our type-checker and
things to prevent illegal names from leaking out. The idea is that we
give each inference (existential) variable – whether it be a type or
a lifetime – a universe. That variable's value can then only
reference names visible from that universe. So for example is a
lifetime variable is created in U0, then it cannot be assigned a value
of !1
or !2
, because those names are not visible from the universe
U0.
Representing universes with just a counter. You might be surprised to see that the compiler doesn't keep track of a full tree of universes. Instead, it just keeps a counter – and, to determine if one universe can see another one, it just checks if the index is greater. For example, U2 can see U0 because 2 >= 0. But U0 cannot see U2, because 0 >= 2 is false.
How can we get away with this? Doesn't this mean that we would allow U2 to also see U1? The answer is that, yes, we would, if that question ever arose. But because of the structure of our type checker etc, there is no way for that to happen. In order for something happening in the universe U1 to "communicate" with something happening in U2, they would have to have a shared inference variable X in common. And because everything in U1 is scoped to just U1 and its children, that inference variable X would have to be in U0. And since X is in U0, it cannot name anything from U1 (or U2). This is perhaps easiest to see by using a kind of generic "logic" example:
exists<X> {
forall<Y> { ... /* Y is in U1 ... */ }
forall<Z> { ... /* Z is in U2 ... */ }
}
Here, the only way for the two foralls to interact would be through X, but neither Y nor Z are in scope when X is declared, so its value cannot reference either of them.
Universes and placeholder region elements
But where does that error come from? The way it happens is like this.
When we are constructing the region inference context, we can tell
from the type inference context how many placeholder variables exist
(the InferCtxt
has an internal counter). For each of those, we
create a corresponding universal region variable !n
and a "region
element" placeholder(n)
. This corresponds to "some unknown set of other
elements". The value of !n
is {placeholder(n)}
.
At the same time, we also give each existential variable a
universe (also taken from the InferCtxt
). This universe
determines which placeholder elements may appear in its value: For
example, a variable in universe U3 may name placeholder(1)
, placeholder(2)
, and
placeholder(3)
, but not placeholder(4)
. Note that the universe of an inference
variable controls what region elements can appear in its value; it
does not say region elements will appear.
Placeholders and outlives constraints
In the region inference engine, outlives constraints have the form:
V1: V2 @ P
where V1
and V2
are region indices, and hence map to some region
variable (which may be universally or existentially quantified). The
P
here is a "point" in the control-flow graph; it's not important
for this section. This variable will have a universe, so let's call
those universes U(V1)
and U(V2)
respectively. (Actually, the only
one we are going to care about is U(V1)
.)
When we encounter this constraint, the ordinary procedure is to start
a DFS from P
. We keep walking so long as the nodes we are walking
are present in value(V2)
and we add those nodes to value(V1)
. If
we reach a return point, we add in any end(X)
elements. That part
remains unchanged.
But then after that we want to iterate over the placeholder placeholder(x)
elements in V2 (each of those must be visible to U(V2)
, but we
should be able to just assume that is true, we don't have to check
it). We have to ensure that value(V1)
outlives each of those
placeholder elements.
Now there are two ways that could happen. First, if U(V1)
can see
the universe x
(i.e., x <= U(V1)
), then we can just add placeholder(x)
to value(V1)
and be done. But if not, then we have to approximate:
we may not know what set of elements placeholder(x)
represents, but we
should be able to compute some sort of upper bound B for it –
some region B that outlives placeholder(x)
. For now, we'll just use
'static
for that (since it outlives everything) – in the future, we
can sometimes be smarter here (and in fact we have code for doing this
already in other contexts). Moreover, since 'static
is in the root
universe U0, we know that all variables can see it – so basically if
we find that value(V2)
contains placeholder(x)
for some universe x
that V1
can't see, then we force V1
to 'static
.
Extending the "universal regions" check
After all constraints have been propagated, the NLL region inference
has one final check, where it goes over the values that wound up being
computed for each universal region and checks that they did not get
'too large'. In our case, we will go through each placeholder region
and check that it contains only the placeholder(u)
element it is known to
outlive. (Later, we might be able to know that there are relationships
between two placeholder regions and take those into account, as we do
for universal regions from the fn signature.)
Put another way, the "universal regions" check can be considered to be checking constraints like:
{placeholder(1)}: V1
where {placeholder(1)}
is like a constant set, and V1 is the variable we
made to represent the !1
region.
Back to our example
OK, so far so good. Now let's walk through what would happen with our first example:
fn(&'static u32) <: fn(&'!1 u32) @ P // this point P is not imp't here
The region inference engine will create a region element domain like this:
{ CFG; end('static); placeholder(1) }
--- ------------ ------- from the universe `!1`
| 'static is always in scope
all points in the CFG; not especially relevant here
It will always create two universal variables, one representing
'static
and one representing '!1
. Let's call them Vs and V1. They
will have initial values like so:
Vs = { CFG; end('static) } // it is in U0, so can't name anything else
V1 = { placeholder(1) }
From the subtyping constraint above, we would have an outlives constraint like
'!1: 'static @ P
To process this, we would grow the value of V1 to include all of Vs:
Vs = { CFG; end('static) }
V1 = { CFG; end('static), placeholder(1) }
At that point, constraint propagation is complete, because all the outlives relationships are satisfied. Then we would go to the "check universal regions" portion of the code, which would test that no universal region grew too large.
In this case, V1
did grow too large – it is not known to outlive
end('static)
, nor any of the CFG – so we would report an error.
Another example
What about this subtyping relationship?
for<'a> fn(&'a u32, &'a u32)
<:
for<'b, 'c> fn(&'b u32, &'c u32)
Here we would replace the bound region in the supertype with a placeholder, as before, yielding:
for<'a> fn(&'a u32, &'a u32)
<:
fn(&'!1 u32, &'!2 u32)
then we instantiate the variable on the left-hand side with an
existential in universe U2, yielding the following (?n
is a notation
for an existential variable):
fn(&'?3 u32, &'?3 u32)
<:
fn(&'!1 u32, &'!2 u32)
Then we break this down further:
&'!1 u32 <: &'?3 u32
&'!2 u32 <: &'?3 u32
and even further, yield up our region constraints:
'!1: '?3
'!2: '?3
Note that, in this case, both '!1
and '!2
have to outlive the
variable '?3
, but the variable '?3
is not forced to outlive
anything else. Therefore, it simply starts and ends as the empty set
of elements, and hence the type-check succeeds here.
(This should surprise you a little. It surprised me when I first realized it.
We are saying that if we are a fn that needs both of its arguments to have
the same region, we can accept being called with arguments with two
distinct regions. That seems intuitively unsound. But in fact, it's fine, as
I tried to explain in this issue on the Rust issue
tracker long ago. The reason is that even if we get called with arguments of
two distinct lifetimes, those two lifetimes have some intersection (the call
itself), and that intersection can be our value of 'a
that we use as the
common lifetime of our arguments. -nmatsakis)
Final example
Let's look at one last example. We'll extend the previous one to have a return type:
for<'a> fn(&'a u32, &'a u32) -> &'a u32
<:
for<'b, 'c> fn(&'b u32, &'c u32) -> &'b u32
Despite seeming very similar to the previous example, this case is going to get an error. That's good: the problem is that we've gone from a fn that promises to return one of its two arguments, to a fn that is promising to return the first one. That is unsound. Let's see how it plays out.
First, we replace the bound region in the supertype with a placeholder:
for<'a> fn(&'a u32, &'a u32) -> &'a u32
<:
fn(&'!1 u32, &'!2 u32) -> &'!1 u32
Then we instantiate the subtype with existentials (in U2):
fn(&'?3 u32, &'?3 u32) -> &'?3 u32
<:
fn(&'!1 u32, &'!2 u32) -> &'!1 u32
And now we create the subtyping relationships:
&'!1 u32 <: &'?3 u32 // arg 1
&'!2 u32 <: &'?3 u32 // arg 2
&'?3 u32 <: &'!1 u32 // return type
And finally the outlives relationships. Here, let V1, V2, and V3 be the
variables we assign to !1
, !2
, and ?3
respectively:
V1: V3
V2: V3
V3: V1
Those variables will have these initial values:
V1 in U1 = {placeholder(1)}
V2 in U2 = {placeholder(2)}
V3 in U2 = {}
Now because of the V3: V1
constraint, we have to add placeholder(1)
into V3
(and
indeed it is visible from V3
), so we get:
V3 in U2 = {placeholder(1)}
then we have this constraint V2: V3
, so we wind up having to enlarge
V2
to include placeholder(1)
(which it can also see):
V2 in U2 = {placeholder(1), placeholder(2)}
Now constraint propagation is done, but when we check the outlives
relationships, we find that V2
includes this new element placeholder(1)
,
so we report an error.
Borrow Checker Errors
TODO: we should discuss how to generate errors from the results of these analyses.
Two-phase borrows
Two-phase borrows are a more permissive version of mutable borrows that allow
nested method calls such as vec.push(vec.len())
. Such borrows first act as
shared borrows in a "reservation" phase and can later be "activated" into a
full mutable borrow.
Only certain implicit mutable borrows can be two-phase, any &mut
or ref mut
in the source code is never a two-phase borrow. The cases where we generate a
two-phase borrow are:
- The autoref borrow when calling a method with a mutable reference receiver.
- A mutable reborrow in function arguments.
- The implicit mutable borrow in an overloaded compound assignment operator.
To give some examples:
# #![allow(unused_variables)] #fn main() { // In the source code // Case 1: let mut v = Vec::new(); v.push(v.len()); let r = &mut Vec::new(); r.push(r.len()); // Case 2: std::mem::replace(r, vec![1, r.len()]); // Case 3: let mut x = std::num::Wrapping(2); x += x; #}
Expanding these enough to show the two-phase borrows:
// Case 1:
let mut v = Vec::new();
let temp1 = &two_phase v;
let temp2 = v.len();
Vec::push(temp1, temp2);
let r = &mut Vec::new();
let temp3 = &two_phase *r;
let temp4 = r.len();
Vec::push(temp3, temp4);
// Case 2:
let temp5 = &two_phase *r;
let temp6 = vec![1, r.len()];
std::mem::replace(temp5, temp6);
// Case 3:
let mut x = std::num::Wrapping(2);
let temp7 = &two_phase x;
let temp8 = x;
std::ops::AddAssign::add_assign(temp7, temp8);
Whether a borrow can be two-phase is tracked by a flag on the AutoBorrow
after type checking, which is then converted to a BorrowKind
during MIR
construction.
Each two-phase borrow is assigned to a temporary that is only used once. As such we can define:
- The point where the temporary is assigned to is called the reservation point of the two-phase borrow.
- The point where the temporary is used, which is effectively always a function call, is called the activation point.
The activation points are found using the GatherBorrows
visitor. The
BorrowData
then holds both the reservation and activation points for the
borrow.
Checking two-phase borrows
Two-phase borrows are treated as if they were mutable borrows with the following exceptions:
- At every location in the MIR we check if any two-phase borrows are activated at this location. If a live two phase borrow is activated at a location, then we check that there are no borrows that conflict with the two-phase borrow.
- At the reservation point we error if there are conflicting live mutable borrows. And lint if there are any conflicting shared borrows.
- Between the reservation and the activation point, the two-phase borrow acts
as a shared borrow. We determine (in
is_active
) if we're at such a point by using theDominators
for the MIR graph. - After the activation point, the two-phase borrow acts as a mutable borrow.
Constant Evaluation
Constant evaluation is the process of computing values at compile time. For a specific item (constant/static/array length) this happens after the MIR for the item is borrow-checked and optimized. In many cases trying to const evaluate an item will trigger the computation of its MIR for the first time.
Prominent examples are
- The initializer of a
static
- Array length
- needs to be known to reserve stack or heap space
- Enum variant discriminants
- needs to be known to prevent two variants from having the same discriminant
- Patterns
- need to be known to check for overlapping patterns
Additionally constant evaluation can be used to reduce the workload or binary size at runtime by precomputing complex operations at compiletime and only storing the result.
Constant evaluation can be done by calling the const_eval
query of TyCtxt
.
The const_eval
query takes a ParamEnv
of environment in
which the constant is evaluated (e.g. the function within which the constant is
used) and a GlobalId
. The GlobalId
is made up of an
Instance
referring to a constant or static or of an
Instance
of a function and an index into the function's Promoted
table.
Constant evaluation returns a Result
with either the error, or the simplest
representation of the constant. "simplest" meaning if it is representable as an
integer or fat pointer, it will directly yield the value (via ConstValue::Scalar
or
ConstValue::ScalarPair
), instead of referring to the miri
virtual
memory allocation (via ConstValue::ByRef
). This means that the const_eval
function cannot be used to create miri-pointers to the evaluated constant or
static. If you need that, you need to directly work with the functions in
src/librustc_mir/const_eval.rs.
Miri
Miri (MIR Interpreter) is a virtual machine for executing MIR without
compiling to machine code. It is usually invoked via tcx.const_eval
.
If you start out with a constant
# #![allow(unused_variables)] #fn main() { const FOO: usize = 1 << 12; #}
rustc doesn't actually invoke anything until the constant is either used or placed into metadata.
Once you have a use-site like
type Foo = [u8; FOO - 42];
The compiler needs to figure out the length of the array before being able to create items that use the type (locals, constants, function arguments, ...).
To obtain the (in this case empty) parameter environment, one can call
let param_env = tcx.param_env(length_def_id);
. The GlobalId
needed is
let gid = GlobalId {
promoted: None,
instance: Instance::mono(length_def_id),
};
Invoking tcx.const_eval(param_env.and(gid))
will now trigger the creation of
the MIR of the array length expression. The MIR will look something like this:
const Foo::{{initializer}}: usize = {
let mut _0: usize; // return pointer
let mut _1: (usize, bool);
bb0: {
_1 = CheckedSub(const Unevaluated(FOO, Slice([])), const 42usize);
assert(!(_1.1: bool), "attempt to subtract with overflow") -> bb1;
}
bb1: {
_0 = (_1.0: usize);
return;
}
}
Before the evaluation, a virtual memory location (in this case essentially a
vec![u8; 4]
or vec![u8; 8]
) is created for storing the evaluation result.
At the start of the evaluation, _0
and _1
are
ConstValue::Scalar(Scalar::Undef)
. When the initialization of _1
is invoked, the
value of the FOO
constant is required, and triggers another call to
tcx.const_eval
, which will not be shown here. If the evaluation of FOO is
successful, 42 will be subtracted by its value 4096
and the result stored in
_1
as ConstValue::ScalarPair(Scalar::Bytes(4054), Scalar::Bytes(0))
. The first
part of the pair is the computed value, the second part is a bool that's true if
an overflow happened.
The next statement asserts that said boolean is 0
. In case the assertion
fails, its error message is used for reporting a compile-time error.
Since it does not fail, ConstValue::Scalar(Scalar::Bytes(4054))
is stored in the
virtual memory was allocated before the evaluation. _0
always refers to that
location directly.
After the evaluation is done, the virtual memory allocation is interned into the
TyCtxt
. Future evaluations of the same constants will not actually invoke
miri, but just extract the value from the interned allocation.
The tcx.const_eval
function has one additional feature: it will not return a
ByRef(interned_allocation_id)
, but a Scalar(computed_value)
if possible. This
makes using the result much more convenient, as no further queries need to be
executed in order to get at something as simple as a usize
.
Datastructures
Miri's core datastructures can be found in
librustc/mir/interpret.
This is mainly the error enum and the ConstValue
and Scalar
types. A ConstValue
can
be either Scalar
(a single Scalar
), ScalarPair
(two Scalar
s, usually fat
pointers or two element tuples) or ByRef
, which is used for anything else and
refers to a virtual allocation. These allocations can be accessed via the
methods on tcx.interpret_interner
.
If you are expecting a numeric result, you can use unwrap_usize
(panics on
anything that can't be representad as a u64
) or assert_usize
which results
in an Option<u128>
yielding the Scalar
if possible.
Allocations
A miri allocation is either a byte sequence of the memory or an Instance
in
the case of function pointers. Byte sequences can additionally contain
relocations that mark a group of bytes as a pointer to another allocation. The
actual bytes at the relocation refer to the offset inside the other allocation.
These allocations exist so that references and raw pointers have something to
point to. There is no global linear heap in which things are allocated, but each
allocation (be it for a local variable, a static or a (future) heap allocation)
gets its own little memory with exactly the required size. So if you have a
pointer to an allocation for a local variable a
, there is no possible (no
matter how unsafe) operation that you can do that would ever change said pointer
to a pointer to b
.
Interpretation
Although the main entry point to constant evaluation is the tcx.const_eval
query, there are additional functions in
librustc_mir/const_eval.rs
that allow accessing the fields of a ConstValue
(ByRef
or otherwise). You should
never have to access an Allocation
directly except for translating it to the
compilation target (at the moment just LLVM).
Miri starts by creating a virtual stack frame for the current constant that is being evaluated. There's essentially no difference between a constant and a function with no arguments, except that constants do not allow local (named) variables at the time of writing this guide.
A stack frame is defined by the Frame
type in
librustc_mir/interpret/eval_context.rs
and contains all the local
variables memory (None
at the start of evaluation). Each frame refers to the
evaluation of either the root constant or subsequent calls to const fn
. The
evaluation of another constant simply calls tcx.const_eval
, which produces an
entirely new and independent stack frame.
The frames are just a Vec<Frame>
, there's no way to actually refer to a
Frame
's memory even if horrible shenigans are done via unsafe code. The only
memory that can be referred to are Allocation
s.
Miri now calls the step
method (in
librustc_mir/interpret/step.rs
) until it either returns an error or has no further statements to execute. Each
statement will now initialize or modify the locals or the virtual memory
referred to by a local. This might require evaluating other constants or
statics, which just recursively invokes tcx.const_eval
.
Parameter Environment
When working with associated and/or or generic items (types, constants,
functions/methods) it is often relevant to have more information about the
Self
or generic parameters. Trait bounds and similar information is encoded in
the ParamEnv
. Often this is not enough information to obtain things like the
type's Layout
, but you can do all kinds of other checks on it (e.g. whether a
type implements Copy
) or you can evaluate an associated constant whose value
does not depend on anything from the parameter environment.
For example if you have a function
# #![allow(unused_variables)] #fn main() { fn foo<T: Copy>(t: T) { } #}
the parameter environment for that function is [T: Copy]
. This means any
evaluation within this function will, when accessing the type T
, know about
its Copy
bound via the parameter environment.
Although you can obtain a valid ParamEnv
for any item via
tcx.param_env(def_id)
, this ParamEnv
can be too generic for your use case.
Using the ParamEnv
from the surrounding context can allow you to evaluate more
things.
Another great thing about ParamEnv
is that you can use it to bundle the thing
depending on generic parameters (e.g. a Ty
) by calling param_env.and(ty)
.
This will produce a ParamEnvAnd<Ty>
, making clear that you should probably not
be using the inner value without taking care to also use the ParamEnv
.
Code generation
Code generation or "codegen" is the part of the compiler that actually generates an executable binary. rustc uses LLVM for code generation.
NOTE: If you are looking for hints on how to debug code generation bugs, please see this section of the debugging chapter.
What is LLVM?
All of the preceding chapters of this guide have one thing in common: we never generated any executable machine code at all! With this chapter, all of that changes.
Like most compilers, rustc is composed of a "frontend" and a "backend". The
"frontend" is responsible for taking raw source code, checking it for
correctness, and getting it into a format X
from which we can generate
executable machine code. The "backend" then takes that format X
and produces
(possibly optimized) executable machine code for some platform. All of the
previous chapters deal with rustc's frontend.
rustc's backend is LLVM, "a collection of modular and
reusable compiler and toolchain technologies". In particular, the LLVM project
contains a pluggable compiler backend (also called "LLVM"), which is used by
many compiler projects, including the clang
C compiler and our beloved
rustc
.
LLVM's "format X
" is called LLVM IR. It is basically assembly code with
additional low-level types and annotations added. These annotations are helpful
for doing optimizations on the LLVM IR and outputted machine code. The end
result of all this is (at long last) something executable (e.g. an ELF object
or wasm).
There are a few benefits to using LLVM:
- We don't have to write a whole compiler backend. This reduces implementation and maintenance burden.
- We benefit from the large suite of advanced optimizations that the LLVM project has been collecting.
- We automatically can compile Rust to any of the platforms for which LLVM has support. For example, as soon as LLVM added support for wasm, voila! rustc, clang, and a bunch of other languages were able to compile to wasm! (Well, there was some extra stuff to be done, but we were 90% there anyway).
- We and other compiler projects benefit from each other. For example, when the Spectre and Meltdown security vulnerabilities were discovered, only LLVM needed to be patched.
Generating LLVM IR
TODO
Updating LLVM
The Rust compiler uses LLVM as its primary codegen backend today, and naturally we want to at least occasionally update this dependency! Currently we do not have a strict policy about when to update LLVM or what it can be updated to, but a few guidelines are applied:
- We try to always support the latest released version of LLVM
- We try to support the "last few" versions of LLVM (how many is changing over time)
- We allow moving to arbitrary commits during development.
- Strongly prefer to upstream all patches to LLVM before including them in rustc.
This policy may change over time (or may actually start to exist as a formal policy!), but for now these are rough guidelines!
Why update LLVM?
There are two primary reasons nowadays that we want to update LLVM in one way or another:
-
First, a bug could have been fixed! Often we find bugs in the compiler and fix them upstream in LLVM. We'll want to pull fixes back to the compiler itself as they're merged upstream.
-
Second, a new feature may be avaiable in LLVM that we want to use in rustc, but we don't want to wait for a full LLVM release to test it out.
Each of these reasons has a different strategy for updating LLVM, and we'll go over both in detail here.
Bugfix Updates
For updates of LLVM that typically just update a bug, we cherry-pick the bugfix to the branch we're already using. The steps for this are:
- Make sure the bugfix is in upstream LLVM.
- Identify the branch that rustc is currently using. The
src/llvm-project
submodule is always pinned to a branch of the rust-lang/llvm-project repository. - Fork the rust-lang/llvm-project repository
- Check out the appropriate branch (typically named
rustc/a.b-yyyy-mm-dd
) - Cherry-pick the upstream commit onto the branch
- Push this branch to your fork
- Send a Pull Request to rust-lang/llvm-project to the same branch as before
- Wait for the PR to be merged
- Send a PR to rust-lang/rust updating the
src/llvm-project
submodule with your bugfix - Wait for PR to be merged
The tl;dr; is that we can cherry-pick bugfixes at any time and pull them back into the rust-lang/llvm-project branch that we're using, and getting it into the compiler is just updating the submodule via a PR!
Example PRs look like: #59089
Feature updates
Note that this is all information as applies to the current day in age. This process for updating LLVM changes with practically all LLVM updates, so this may be out of date!
Unlike bugfixes, updating to pick up a new feature of LLVM typically requires a lot more work. This is where we can't reasonably cherry-pick commits backwards so we need to do a full update. There's a lot of stuff to do here, so let's go through each in detail.
-
Create a new branch in the rust-lang/llvm-project repository. This branch should be named
rustc/a.b-yyyy-mm-dd
wherea.b
is the current version number of LLVM in-tree at the time of the branch and the remaining part is today's date. -
Apply Rust-specific patches to the llvm-project repository. All features and bugfixes are upstream, but there's often some weird build-related patches that don't make sense to upstream which we have on our repositories. These patches are around the latest patches in the rust-lang/llvm-project branch that rustc is currently using.
-
Update the
compiler-rt
submodule in therust-lang-nursery/compiler-builtins
repository. Push this update to the same branch name of thellvm-project
submodule to the of therust-lang/compiler-rt
repository. Then push this update to a branch ofcompiler-builtins
with the same-named branch. Note that this step is frequently optional since we may not need to updatecompiler-rt
. -
Prepare a commit to rust-lang/rust
- Update
src/llvm-project
- Update
compiler-builtins
crate inCargo.lock
(if necessary)
-
Build your commit. Make sure you've committed the previous changes to ensure submodule updates aren't reverted. Some commands you should execute are:
./x.py build src/llvm
- test that LLVM still builds./x.py build src/tools/lld
- same for LLD./x.py build
- build the rest of rustc
You'll likely need to update
src/rustllvm/*.cpp
to compile with updated LLVM bindings. Note that you should use#ifdef
and such to ensure that the bindings still compile on older LLVM versions. -
Test for regressions across other platforms. LLVM often has at least one bug for non-tier-1 architectures, so it's good to do some more testing before sending this to bors! If you're low on resources you can send the PR as-is now to bors, though, and it'll get tested anyway.
Ideally, build LLVM and test it on a few platforms:
- Linux
- OSX
- Windows
and afterwards run some docker containers that CI also does:
./src/ci/docker/run.sh wasm32-unknown
./src/ci/docker/run.sh arm-android
./src/ci/docker/run.sh dist-various-1
./src/ci/docker/run.sh dist-various-2
./src/ci/docker/run.sh armhf-gnu
-
Send a PR! Hopefully it's smooth sailing from here :).
For prior art, previous LLVM updates look like #55835 #47828
Caveats and gotchas
Ideally the above instructions are pretty smooth, but here's some caveats to keep in mind while going through them:
- LLVM bugs are hard to find, don't hesitate to ask for help! Bisection is definitely your friend here (yes LLVM takes forever to build, yet bisection is still your friend)
- Updating LLDB has some Rust-specific patches currently that aren't upstream. If you have difficulty @tromey can likely help out.
- If you've got general questions, @alexcrichton can help you out.
- Creating branches is a privileged operation on GitHub, so you'll need someone with write access to create the branches for you most likely.
Debugging LLVM
NOTE: If you are looking for info about code generation, please see this chapter instead.
This section is about debugging compiler bugs in code generation (e.g. why the compiler generated some piece of code or crashed in LLVM). LLVM is a big project on its own that probably needs to have its own debugging document (not that I could find one). But here are some tips that are important in a rustc context:
As a general rule, compilers generate lots of information from analyzing code. Thus, a useful first step is usually to find a minimal example. One way to do this is to
-
create a new crate that reproduces the issue (e.g. adding whatever crate is at fault as a dependency, and using it from there)
-
minimize the crate by removing external dependencies; that is, moving everything relevant to the new crate
-
further minimize the issue by making the code shorter (there are tools that help with this like
creduce
)
The official compilers (including nightlies) have LLVM assertions disabled,
which means that LLVM assertion failures can show up as compiler crashes (not
ICEs but "real" crashes) and other sorts of weird behavior. If you are
encountering these, it is a good idea to try using a compiler with LLVM
assertions enabled - either an "alt" nightly or a compiler you build yourself
by setting [llvm] assertions=true
in your config.toml - and see whether
anything turns up.
The rustc build process builds the LLVM tools into
./build/<host-triple>/llvm/bin
. They can be called directly.
The default rustc compilation pipeline has multiple codegen units, which is
hard to replicate manually and means that LLVM is called multiple times in
parallel. If you can get away with it (i.e. if it doesn't make your bug
disappear), passing -C codegen-units=1
to rustc will make debugging easier.
To rustc to generate LLVM IR, you need to pass the --emit=llvm-ir
flag. If
you are building via cargo, use the RUSTFLAGS
environment variable (e.g.
RUSTFLAGS='--emit=llvm-ir'
). This causes rustc to spit out LLVM IR into the
target directory.
cargo llvm-ir [options] path
spits out the LLVM IR for a particular function
at path
. (cargo install cargo-asm
installs cargo asm
and cargo llvm-ir
). --build-type=debug
emits code for debug builds. There are also
other useful options. Also, debug info in LLVM IR can clutter the output a lot:
RUSTFLAGS="-C debuginfo=0"
is really useful.
RUSTFLAGS="-C save-temps"
outputs LLVM bitcode (not the same as IR) at
different stages during compilation, which is sometimes useful. One just needs
to convert the bitcode files to .ll
files using llvm-dis
which should be in
the target local compilation of rustc.
If you want to play with the optimization pipeline, you can use the opt
tool
from ./build/<host-triple>/llvm/bin/
with the LLVM IR emitted by rustc. Note
that rustc emits different IR depending on whether -O
is enabled, even
without LLVM's optimizations, so if you want to play with the IR rustc emits,
you should:
$ rustc +local my-file.rs --emit=llvm-ir -O -C no-prepopulate-passes \
-C codegen-units=1
$ OPT=./build/$TRIPLE/llvm/bin/opt
$ $OPT -S -O2 < my-file.ll > my
If you just want to get the LLVM IR during the LLVM pipeline, to e.g. see which
IR causes an optimization-time assertion to fail, or to see when LLVM performs
a particular optimization, you can pass the rustc flag -C llvm-args=-print-after-all
, and possibly add -C llvm-args='-filter-print-funcs=EXACT_FUNCTION_NAME
(e.g. -C llvm-args='-filter-print-funcs=_ZN11collections3str21_$LT$impl$u20$str$GT$\ 7replace17hbe10ea2e7c809b0bE'
).
That produces a lot of output into standard error, so you'll want to pipe that
to some file. Also, if you are using neither -filter-print-funcs
nor -C codegen-units=1
, then, because the multiple codegen units run in parallel, the
printouts will mix together and you won't be able to read anything.
If you want just the IR for a specific function (say, you want to see why it
causes an assertion or doesn't optimize correctly), you can use llvm-extract
,
e.g.
$ ./build/$TRIPLE/llvm/bin/llvm-extract \
-func='_ZN11collections3str21_$LT$impl$u20$str$GT$7replace17hbe10ea2e7c809b0bE' \
-S \
< unextracted.ll \
> extracted.ll
Filing LLVM bug reports
When filing an LLVM bug report, you will probably want some sort of minimal working example that demonstrates the problem. The Godbolt compiler explorer is really helpful for this.
-
Once you have some LLVM IR for the problematic code (see above), you can create a minimal working example with Godbolt. Go to gcc.godbolt.org.
-
Choose
LLVM-IR
as programming language. -
Use
llc
to compile the IR to a particular target as is:- There are some useful flags:
-mattr
enables target features,-march=
selects the target,-mcpu=
selects the CPU, etc. - Commands like
llc -march=help
output all architectures available, which is useful because sometimes the Rust arch names and the LLVM names do not match. - If you have compiled rustc yourself somewhere, in the target directory
you have binaries for
llc
,opt
, etc.
- There are some useful flags:
-
If you want to optimize the LLVM-IR, you can use
opt
to see how the LLVM optimizations transform it. -
Once you have a godbolt link demonstrating the issue, it is pretty easy to fill in an LLVM bug.
Profile Guided Optimization
rustc
supports doing profile-guided optimization (PGO).
This chapter describes what PGO is and how the support for it is
implemented in rustc
.
What Is Profiled-Guided Optimization?
The basic concept of PGO is to collect data about the typical execution of a program (e.g. which branches it is likely to take) and then use this data to inform optimizations such as inlining, machine-code layout, register allocation, etc.
There are different ways of collecting data about a program's execution.
One is to run the program inside a profiler (such as perf
) and another
is to create an instrumented binary, that is, a binary that has data
collection built into it, and run that.
The latter usually provides more accurate data.
How is PGO implemented in rustc
?
rustc
current PGO implementation relies entirely on LLVM.
LLVM actually supports multiple forms of PGO:
- Sampling-based PGO where an external profiling tool like
perf
is used to collect data about a program's execution. - GCOV-based profiling, where code coverage infrastructure is used to collect profiling information.
- Front-end based instrumentation, where the compiler front-end (e.g. Clang) inserts instrumentation intrinsics into the LLVM IR it generates.
- IR-level instrumentation, where LLVM inserts the instrumentation intrinsics itself during optimization passes.
rustc
supports only the last approach, IR-level instrumentation, mainly
because it is almost exclusively implemented in LLVM and needs little
maintenance on the Rust side. Fortunately, it is also the most modern approach,
yielding the best results.
So, we are dealing with an instrumentation-based approach, i.e. profiling data is generated by a specially instrumented version of the program that's being optimized. Instrumentation-based PGO has two components: a compile-time component and run-time component, and one needs to understand the overall workflow to see how they interact.
Overall Workflow
Generating a PGO-optimized program involves the following four steps:
- Compile the program with instrumentation enabled (e.g.
rustc -Cprofile-generate main.rs
) - Run the instrumented program (e.g.
./main
) which generates adefault-<id>.profraw
file - Convert the
.profraw
file into a.profdata
file using LLVM'sllvm-profdata
tool. - Compile the program again, this time making use of the profiling data
(e.g.
rustc -Cprofile-use=merged.profdata main.rs
)
Compile-Time Aspects
Depending on which step in the above workflow we are in, two different things can happen at compile time:
Create Binaries with Instrumentation
As mentioned above, the profiling instrumentation is added by LLVM.
rustc
instructs LLVM to do so by setting the appropriate
flags when creating LLVM PassManager
s:
// `PMBR` is an `LLVMPassManagerBuilderRef`
unwrap(PMBR)->EnablePGOInstrGen = true;
// Instrumented binaries have a default output path for the `.profraw` file
// hard-coded into them:
unwrap(PMBR)->PGOInstrGen = PGOGenPath;
rustc
also has to make sure that some of the symbols from LLVM's profiling
runtime are not removed by marking the with the right export level.
Compile Binaries Where Optimizations Make Use Of Profiling Data
In the final step of the workflow described above, the program is compiled
again, with the compiler using the gathered profiling data in order to drive
optimization decisions. rustc
again leaves most of the work to LLVM here,
basically just telling the LLVM PassManagerBuilder
where the profiling data can be found:
unwrap(PMBR)->PGOInstrUse = PGOUsePath;
LLVM does the rest (e.g. setting branch weights, marking functions with
cold
or inlinehint
, etc).
Runtime Aspects
Instrumentation-based approaches always also have a runtime component, i.e. once we have an instrumented program, that program needs to be run in order to generate profiling data, and collecting and persisting this profiling data needs some infrastructure in place.
In the case of LLVM, these runtime components are implemented in
compiler-rt and statically linked into any instrumented
binaries.
The rustc
version of this can be found in src/libprofiler_builtins
which
basically packs the C code from compiler-rt
into a Rust crate.
In order for libprofiler_builtins
to be built, profiler = true
must be set
in rustc
's config.toml
.
Testing PGO
Since the PGO workflow spans multiple compiler invocations most testing happens
in run-make tests (the relevant tests have pgo
in their name).
There is also a codegen test that checks that some expected
instrumentation artifacts show up in LLVM IR.
Additional Information
Clang's documentation contains a good overview on PGO in LLVM here: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
Appendix A: A tutorial on creating a drop-in replacement for rustc
Note: This is a copy of
@nrc
's amazing stupid-stats. You should find a copy of the code on the GitHub repository.Due to the compiler's constantly evolving nature, the
rustc_driver
mechanisms described in this chapter have changed. In particular, theCompilerCalls
andCompileController
types have been replaced byCallbacks
. Also, there is a new query-based interface in therustc_interface
crate. See The Rustc Driver and Interface for more information.
Many tools benefit from being a drop-in replacement for a compiler. By this, I
mean that any user of the tool can use mytool
in all the ways they would
normally use rustc
- whether manually compiling a single file or as part of a
complex make project or Cargo build, etc. That could be a lot of work;
rustc, like most compilers, takes a large number of command line arguments which
can affect compilation in complex and interacting ways. Emulating all of this
behaviour in your tool is annoying at best, especically if you are making many
of the same calls into librustc that the compiler is.
The kind of things I have in mind are tools like rustdoc or a future rustfmt. These want to operate as closely as possible to real compilation, but have totally different outputs (documentation and formatted source code, respectively). Another use case is a customised compiler. Say you want to add a custom code generation phase after macro expansion, then creating a new tool should be easier than forking the compiler (and keeping it up to date as the compiler evolves).
I have gradually been trying to improve the API of librustc to make creating a drop-in tool easier to produce (many others have also helped improve these interfaces over the same time frame). It is now pretty simple to make a tool which is as close to rustc as you want it to be. In this tutorial I'll show how.
Note/warning, everything I talk about in this tutorial is internal API for rustc. It is all extremely unstable and likely to change often and in unpredictable ways. Maintaining a tool which uses these APIs will be non- trivial, although hopefully easier than maintaining one that does similar things without using them.
This tutorial starts with a very high level view of the rustc compilation process and of some of the code that drives compilation. Then I'll describe how that process can be customised. In the final section of the tutorial, I'll go through an example - stupid-stats - which shows how to build a drop-in tool.
Overview of the compilation process
Compilation using rustc happens in several phases. We start with parsing, this
includes lexing. The output of this phase is an AST (abstract syntax tree).
There is a single AST for each crate (indeed, the entire compilation process
operates over a single crate). Parsing abstracts away details about individual
files which will all have been read in to the AST in this phase. At this stage
the AST includes all macro uses, attributes will still be present, and nothing
will have been eliminated due to cfg
s.
The next phase is configuration and macro expansion. This can be thought of as a
function over the AST. The unexpanded AST goes in and an expanded AST comes out.
Macros and syntax extensions are expanded, and cfg
attributes will cause some
code to disappear. The resulting AST won't have any macros or macro uses left
in.
The code for these first two phases is in libsyntax.
After this phase, the compiler allocates ids to each node in the AST (technically not every node, but most of them). If we are writing out dependencies, that happens now.
The next big phase is analysis. This is the most complex phase and uses the bulk of the code in rustc. This includes name resolution, type checking, borrow checking, type and lifetime inference, trait selection, method selection, linting, and so forth. Most error detection is done in this phase (although parse errors are found during parsing). The 'output' of this phase is a bunch of side tables containing semantic information about the source program. The analysis code is in librustc and a bunch of other crates with the 'librustc_' prefix.
Next is translation, this translates the AST (and all those side tables) into LLVM IR (intermediate representation). We do this by calling into the LLVM libraries, rather than actually writing IR directly to a file. The code for this is in librustc_trans.
The next phase is running the LLVM backend. This runs LLVM's optimisation passes on the generated IR and then generates machine code. The result is object files. This phase is all done by LLVM, it is not really part of the rust compiler. The interface between LLVM and rustc is in librustc_llvm.
Finally, we link the object files into an executable. Again we outsource this to other programs and it's not really part of the rust compiler. The interface is in librustc_back (which also contains some things used primarily during translation).
NOTE:
librustc_trans
andlibrustc_back
no longer exist, and we don't translate AST or HIR directly to LLVM IR anymore. Instead, seelibrustc_codegen_llvm
andlibrustc_codegen_utils
.
All these phases are coordinated by the driver. To see the exact sequence, look
at the compile_input
function in librustc_driver
.
The driver handles all the highest level coordination of compilation -
1. handling command-line arguments
2. maintaining compilation state (primarily in the Session
)
3. calling the appropriate code to run each phase of compilation
4. handles high level coordination of pretty printing and testing.
To create a drop-in compiler replacement or a compiler replacement,
we leave most of compilation alone and customise the driver using its APIs.
The driver customisation APIs
There are two primary ways to customise compilation - high level control of the
driver using CompilerCalls
and controlling each phase of compilation using a
CompileController
. The former lets you customise handling of command line
arguments etc., the latter lets you stop compilation early or execute code
between phases.
CompilerCalls
CompilerCalls
is a trait that you implement in your tool. It contains a fairly
ad-hoc set of methods to hook in to the process of processing command line
arguments and driving the compiler. For details, see the comments in
librustc_driver/lib.rs.
I'll summarise the methods here.
early_callback
and late_callback
let you call arbitrary code at different
points - early is after command line arguments have been parsed, but before
anything is done with them; late is pretty much the last thing before
compilation starts, i.e., after all processing of command line arguments, etc.
is done. Currently, you get to choose whether compilation stops or continues at
each point, but you don't get to change anything the driver has done. You can
record some info for later, or perform other actions of your own.
some_input
and no_input
give you an opportunity to modify the primary input
to the compiler (usually the input is a file containing the top module for a
crate, but it could also be a string). You could record the input or perform
other actions of your own.
Ignore parse_pretty
, it is unfortunate and hopefully will get improved. There
is a default implementation, so you can pretend it doesn't exist.
build_controller
returns a CompileController
object for more fine-grained
control of compilation, it is described next.
We might add more options in the future.
CompilerController
CompilerController
is a struct consisting of PhaseController
s and flags.
Currently, there is only flag, make_glob_map
which signals whether to produce
a map of glob imports (used by save-analysis and potentially other tools). There
are probably flags in the session that should be moved here.
There is a PhaseController
for each of the phases described in the above
summary of compilation (and we could add more in the future for finer-grained
control). They are all after_
a phase because they are checked at the end of a
phase (again, that might change), e.g., CompilerController::after_parse
controls what happens immediately after parsing (and before macro expansion).
Each PhaseController
contains a flag called stop
which indicates whether
compilation should stop or continue, and a callback to be executed at the point
indicated by the phase. The callback is called whether or not compilation
continues.
Information about the state of compilation is passed to these callbacks in a
CompileState
object. This contains all the information the compiler has. Note
that this state information is immutable - your callback can only execute code
using the compiler state, it can't modify the state. (If there is demand, we
could change that). The state available to a callback depends on where during
compilation the callback is called. For example, after parsing there is an AST
but no semantic analysis (because the AST has not been analysed yet). After
translation, there is translation info, but no AST or analysis info (since these
have been consumed/forgotten).
An example - stupid-stats
Our example tool is very simple, it simply collects some simple and not very
useful statistics about a program; it is called stupid-stats. You can find
the (more heavily commented) complete source for the example on Github.
To build, just do cargo build
. To run on a file foo.rs
, do cargo run foo.rs
(assuming you have a Rust program called foo.rs
. You can also pass any
command line arguments that you would normally pass to rustc). When you run it
you'll see output similar to
In crate: foo,
Found 12 uses of `println!`;
The most common number of arguments is 1 (67% of all functions);
25% of functions have four or more arguments.
To make things easier, when we talk about functions, we're excluding methods and closures.
You can also use the executable as a drop-in replacement for rustc, because
after all, that is the whole point of this exercise. So, however you use rustc
in your makefile setup, you can use target/stupid
(or whatever executable you
end up with) instead. That might mean setting an environment variable or it
might mean renaming your executable to rustc
and setting your PATH. Similarly,
if you're using Cargo, you'll need to rename the executable to rustc and set the
PATH. Alternatively, you should be able to use
multirust to get around all the PATH stuff
(although I haven't actually tried that).
(Note that this example prints to stdout. I'm not entirely sure what Cargo does
with stdout from rustc under different circumstances. If you don't see any
output, try inserting a panic!
after the println!
s to error out, then Cargo
should dump stupid-stats' stdout to Cargo's stdout).
Let's start with the main
function for our tool, it is pretty simple:
fn main() {
let args: Vec<_> = std::env::args().collect();
rustc_driver::run_compiler(&args, &mut StupidCalls::new());
std::env::set_exit_status(0);
}
The first line grabs any command line arguments. The second line calls the compiler driver with those arguments. The final line sets the exit code for the program.
The only interesting thing is the StupidCalls
object we pass to the driver.
This is our implementation of the CompilerCalls
trait and is what will make
this tool different from rustc.
StupidCalls
is a mostly empty struct:
struct StupidCalls {
default_calls: RustcDefaultCalls,
}
This tool is so simple that it doesn't need to store any data here, but usually
you would. We embed a RustcDefaultCalls
object to delegate to in our impl when
we want exactly the same behaviour as the Rust compiler. Mostly you don't want
to do that (or at least don't need to) in a tool. However, Cargo calls rustc
with the --print file-names
, so we delegate in late_callback
and no_input
to keep Cargo happy.
Most of the rest of the impl of CompilerCalls
is trivial:
impl<'a> CompilerCalls<'a> for StupidCalls {
fn early_callback(&mut self,
_: &getopts::Matches,
_: &config::Options,
_: &diagnostics::registry::Registry,
_: ErrorOutputType)
-> Compilation {
Compilation::Continue
}
fn late_callback(&mut self,
t: &TransCrate,
m: &getopts::Matches,
s: &Session,
c: &CrateStore,
i: &Input,
odir: &Option<PathBuf>,
ofile: &Option<PathBuf>)
-> Compilation {
self.default_calls.late_callback(t, m, s, c, i, odir, ofile);
Compilation::Continue
}
fn some_input(&mut self,
input: Input,
input_path: Option<Path>)
-> (Input, Option<Path>) {
(input, input_path)
}
fn no_input(&mut self,
m: &getopts::Matches,
o: &config::Options,
odir: &Option<Path>,
ofile: &Option<Path>,
r: &diagnostics::registry::Registry)
-> Option<(Input, Option<Path>)> {
self.default_calls.no_input(m, o, odir, ofile, r);
// This is not optimal error handling.
panic!("No input supplied to stupid-stats");
}
fn build_controller(&mut self, _: &Session) -> driver::CompileController<'a> {
...
}
}
We don't do anything for either of the callbacks, nor do we change the input if
the user supplies it. If they don't, we just panic!
, this is the simplest way
to handle the error, but not very user-friendly, a real tool would give a
constructive message or perform a default action.
In build_controller
we construct our CompileController
. We only want to
parse, and we want to inspect macros before expansion, so we make compilation
stop after the first phase (parsing). The callback after that phase is where the
tool does it's actual work by walking the AST. We do that by creating an AST
visitor and making it walk the AST from the top (the crate root). Once we've
walked the crate, we print the stats we've collected:
fn build_controller(&mut self, _: &Session) -> driver::CompileController<'a> {
// We mostly want to do what rustc does, which is what basic() will return.
let mut control = driver::CompileController::basic();
// But we only need the AST, so we can stop compilation after parsing.
control.after_parse.stop = Compilation::Stop;
// And when we stop after parsing we'll call this closure.
// Note that this will give us an AST before macro expansions, which is
// not usually what you want.
control.after_parse.callback = box |state| {
// Which extracts information about the compiled crate...
let krate = state.krate.unwrap();
// ...and walks the AST, collecting stats.
let mut visitor = StupidVisitor::new();
visit::walk_crate(&mut visitor, krate);
// And finally prints out the stupid stats that we collected.
let cratename = match attr::find_crate_name(&krate.attrs[]) {
Some(name) => name.to_string(),
None => String::from_str("unknown_crate"),
};
println!("In crate: {},\n", cratename);
println!("Found {} uses of `println!`;", visitor.println_count);
let (common, common_percent, four_percent) = visitor.compute_arg_stats();
println!("The most common number of arguments is {} ({:.0}% of all functions);",
common, common_percent);
println!("{:.0}% of functions have four or more arguments.", four_percent);
};
control
}
That is all it takes to create your own drop-in compiler replacement or custom compiler! For the sake of completeness I'll go over the rest of the stupid-stats tool.
# #![allow(unused_variables)] #fn main() { struct StupidVisitor { println_count: usize, arg_counts: Vec<usize>, } #}
The StupidVisitor
struct just keeps track of the number of println!
s it has
seen and the count for each number of arguments. It implements
syntax::visit::Visitor
to walk the AST. Mostly we just use the default
methods, these walk the AST taking no action. We override visit_item
and
visit_mac
to implement custom behaviour when we walk into items (items include
functions, modules, traits, structs, and so forth, we're only interested in
functions) and macros:
impl<'v> visit::Visitor<'v> for StupidVisitor {
fn visit_item(&mut self, i: &'v ast::Item) {
match i.node {
ast::Item_::ItemFn(ref decl, _, _, _, _) => {
// Record the number of args.
self.increment_args(decl.inputs.len());
}
_ => {}
}
// Keep walking.
visit::walk_item(self, i)
}
fn visit_mac(&mut self, mac: &'v ast::Mac) {
// Find its name and check if it is "println".
let ast::Mac_::MacInvocTT(ref path, _, _) = mac.node;
if path_to_string(path) == "println" {
self.println_count += 1;
}
// Keep walking.
visit::walk_mac(self, mac)
}
}
The increment_args
method increments the correct count in
StupidVisitor::arg_counts
. After we're done walking, compute_arg_stats
does
some pretty basic maths to come up with the stats we want about arguments.
What next?
These APIs are pretty new and have a long way to go until they're really good. If there are improvements you'd like to see or things you'd like to be able to do, let me know in a comment or GitHub issue. In particular, it's not clear to me exactly what extra flexibility is required. If you have an existing tool that would be suited to this setup, please try it out and let me know if you have problems.
It'd be great to see Rustdoc converted to using these APIs, if that is possible
(although long term, I'd prefer to see Rustdoc run on the output from save-
analysis, rather than doing its own analysis). Other parts of the compiler
(e.g., pretty printing, testing) could be refactored to use these APIs
internally (I already changed save-analysis to use CompilerController
). I've
been experimenting with a prototype rustfmt which also uses these APIs.
附录 B:背景话题
本节涵盖了本指南中出现的常见编译器术语。我们会在某些特定于 Rust 的上下文中给出这些术语的一般定义。
什么是控制流图?
控制流图(control-flow graph)是编译器中常见的术语。如果你曾使用过流程图, 那么控制流图的概念对你来说会很熟悉。它是程序的一种表示方式, 能够以非常清晰的方式展现底层控制流。
控制流图由以边相连的一组基本块(basic block)构成。基本块的主要概念是 一组「一起」执行的语句,也就是说,只要你的分支跳到了基本块, 它就会从头到尾依次执行所有的语句。只有到基本块的最后才有可能分支到更多地方 (在 MIR 中,我们将最后一条语句称作终止句(terminator)):
bb0: {
statement0;
statement1;
statement2;
...
terminator;
}
你在 Rust 中使用的很多表达式都会编译成多个基本块。例如,考虑以下 if 语句:
a = 1;
if some_variable {
b = 1;
} else {
c = 1;
}
d = 1;
它会被编译成四个基本块
BB0: {
a = 1;
if some_variable { goto BB1 } else { goto BB2 }
}
BB1: {
b = 1;
goto BB3;
}
BB2: {
c = 1;
goto BB3;
}
BB3: {
d = 1;
...;
}
在使用控制流图时,循环会简单地作为一个图中的环路出现,而 break
关键字则会被翻译成跳出此环路的一条路径。
什么是数据流分析?
静态程序分析(Static Program Analysis), 作者 Anders Møller 和 Michael I. Schwartzbach,它是一个绝佳的资源。
to be written
什么是「全称量化」?「存在量化」呢?
to be written
什么是协变和逆变?
详见 Rust 秘典 中的子定型(Subtyping)一章。
关于类型检查器如何处理型变的更多信息见本指南的 型变(variance)一章。
什么是「自由生存域」和「自由变量」?「约束生存域」呢?
我们来描述一下程序变量的自由和约束的概念,因为它们是我们最熟悉的概念。
-
考虑此表达式,它创建了一个闭包:
|a, b| a + b
。在这里,a + b
中的a
和b
指代该闭包被调用会时传入的参数。我们称a
和b
在该闭包中是 被约束(bound)的,而闭包签名|a, b|
是名字a
和b
的约束位(binder) (因为对a
或b
的任何引用都是指代它引入的变量)。 -
考虑此表达式
a + b
。在该表达式中,a
和b
均指代定义在该表达式之外 的局部变量。我们称这些变量在该表达式中自由出现(appear free) (即它们是**自由(free)的,而非被约束(bound)**的(被束缚的))。
所以现在你理解了:在某些「表达式、语句、还是别的什么」中的变量, 如果指代的是定义在该「表达式、语句、还是别的什么」之外的东西,那么它们就是 「自由出现」的。我们可以等价地称之为表达式中的「自由变量」, 毕竟它们就是一组「自由出现」的变量而已。
那么,它们与生存域(region)有什么关系呢?我们可以将类似的概念应用到类型和生存域上来。
例如,在类型 &'a u32
中,'a
是自由出现的。但在类型 for<'a> fn(&'a u32)
中则不是。
附录 C:术语表
编译器中使用了大量...怪异的缩写和词汇。本术语表旨在列出它们并提供一些解释和指引, 以便您更好地理解它们。
中文 | 术语 | 含义 |
---|---|---|
AST | AST | syntax crate 产生的抽象语法树(Abstract Syntax Tree),它非常严格地反映了用户使用的语法。 |
约束位 | binder | 「约束位」是变量和类型声明的位置。例如,<T> 是泛型形参 T 在 fn foo<T>(..) 中的约束位,而 |a | ... 是形参 a 的约束位。详见背景材料。 |
约束变量 | bound variable | 「约束变量」是在表达式/项中声明的变量。例如,变量 a 就是闭包表达式 |a | a * 2 中的约束变量。详见背景材料。 |
codegen | codegen | 将 MIR 翻译成 LLVM IR 的代码。 |
codegen 单元 | codegen unit | 在产生 LLVM IR 时,我们会将 Rust 代码分成几组 codegen 单元。这些单元中的每一个都由 LLVM 独立处理以实现并行性。它们也是增量重用的单位。 |
完备性 | completeness | 完备性是类型论中的技术术语。完备性表示每个类型安全的程序都能通过类型检查。兼顾可靠性(Soundness)和完备性是非常困难的,通常可靠性更加重要(见「可靠性」)。 |
控制流图 | control-flow graph | 程序控制流的一种表示方法。详见背景材料。 |
CTFE | CTFE | 即编译期函数求值(Compile-Time Function Evaluation)。这是编译器在编译期求值 const fn 的能力。它是编译器常量求值系统的一部分(详情)。 |
cx | cx | 我们倾向于用「cx」作为上下文(context)的简写。另见 tcx 、infcx 等。 |
DAG | DAG | 即有向无环图(directed acyclic graph),在编译时用于跟踪查询之间的依赖关系(详情)。 |
数据流分析 | data-flow analysis | 静态分析可用来确定程序控制流中每个节点的属性是否正确。详见背景材料。 |
DefId | DefId | 一个标识定义的索引(见 librustc/hir/def_id.rs )。用于唯一地标识一个 DefPath 。 |
双指针 | Double pointer | 带有附加元数据的指针。详见「胖指针」。 |
drop glue1 | drop glue | (内部)编译器生成的指令,用于操纵对数据类型调用析构器(Drop )。 |
DST | DST | 动态大小的类型(Dynamically-Sized Type)。编译器无法静态地获知该类型在内存中的大小(例如 str 或 [u8] )。这种类型并未实现 Sized 且无法在栈上分配。它们只能作为结构体中的最后一个字段,且只能通过指针来使用(如 &str 或 &[u8] )。 |
先界定的生命周期 | early-bound lifetime | 生存域在定义的位置被代换的生命周期。在元项的 Generics 中界定并通过 Substs 来代换。与后界定生命周期相对(详情)。 |
空类型 | empty type | 见「不可居留类型」 |
胖指针 | Fat pointer | 一种两个字宽的指针,它保存了某个值的地址,以及使用该值所需的额外的信息。Rust 包含两种「胖指针」,分别是切片的引用和特质对象。切片的引用保存了切片的起始地址和长度。特质对象保存了一个值的地址和一个与该值对应的,指向该特质的实现的指针。「胖指针」也被称为「宽指针」或「双指针」。 |
自由变量 | free variable | 「自由变量」即不在表达式或项中约束的变量。详见背景材料。 |
'gcx | 'gcx | 即全域生命周期(详情)。 |
泛型(复数) | generics | 即一组在类型或元项上定义的泛型形参。 |
HIR | HIR | 上层 IR,将 AST 降级并脱糖(desugar)后产生(详情)。 |
HirId | HirId | 由一个 def-id 与一个「定义内偏移(intra-definition offset)」结合产生,用于在 HIR 中标识一个特定的节点。 |
HIR 映射 | HIR Map | HIR 映射,通过 tcx.hir 访问,能够让你在 HIR 中快速导航,并在多种标识符之间互相转换。 |
ICE | ICE | 内部编译器错误(internal compiler error),在编译器崩溃时产生。 |
ICH | ICH | 增量编译散列值(incremental compilation hash)。ICH 作为 HIR 或 crate 元数据之类的指纹,用于检查它们是否被修改。它在增量编译时可用于检测 crate 的某部分是否被修改,确定是否应重新编译。 |
推导变量 | inference variable | 「推导变量」是一种特殊的类型/生存域,在进行类型或生存域推导时,它用来表示尝试推导的东西。类比于代数中的 x。例如,如果我们尝试推导程序中某个变量的类型,那么就要创建一个推导变量来表示该未知的类型。 |
infcx | infcx | 推导上下文(见 librustc/infer ) |
存留 | intern | 「存留」指先将确定且常用的常量数据(如字符串)存储下来,然后用标识符(例如一个 Symbol )来引用它,而非直接使用数据本身,以此节省内存。 |
IR | IR | 中间表示(Intermediate Representation)。编译器中一种通用的形式。在编译过程中,代码会从原始源码(ASCII 文本)转换为多种 IR。在 Rust 中,它们主要是 HIR、MIR 和 LLVM IR。每种 IR 都适用于某种计算集合。例如,MIR 适用于借用检查器,而 LLVM IR 则适用于 codegen,因为 LLVM 可以接受它。 |
IRLO | IRLO | IRLO 或 irlo 有时用作 internals.rust-lang.org 的缩写。 |
元项 | item | 语言中的一种「定义」,例如 static、const、use 语句、模块、结构体等等。具体来说,它对应与 Item 类型。 |
语言元项 | lang item | 表示语言本身固有概念的元项,其中包括:特殊的内建特质如 Sync 和 Send ;表示操作的特质如 Add ,由编译器调用的函数(详情)。 |
后界定的生命周期 | late-bound lifetime | 生存域在调用的位置被代换的生命周期。在 HRTB 中界定并通过编译器中特定的函数来代换,例如 liberate_late_bound_regions 。与先界定生命周期相对(详情)。 |
局部 crate | local crate | 当前正在编译的 crate。 |
LTO | LTO | 链接期优化(Link-Time Optimizations)。一组由 LLVM 提供的优化,它只在最后链接二进制文件时执行。其中包括移除最终程序中不会被使用的函数等多种优化。ThinLTO 是 LTO 的一种变体,旨在提高可扩展性和效率,但可能会牺牲一些优化。您也可以阅读 Rust 源码库中关于「FatLTO」的 Issue。「FatLTO」是为非 Thin LTO 的别称。LLVM 文档:lto 和 thinlto。 |
LLVM | LLVM | (其实不算缩写 :-P)一个开源的编译器后端。它接受 LLVM IR 并输出原生二进制程序。很多编程语言(如 Rust)都可以实现一个输出 LLVM IR 的编译器前端,并用 LLVM 将该语言编译到所有 LLVM 支持的平台上。 |
记忆化 | memoize | 记忆化是一种将计算结果(如纯函数调用的计算结果)存储起来以避免重复计算的方法。这是一种典型的执行速度和内存使用间的权衡。 |
MIR | MIR | 中层 IR,在通过 borrowck 和 codegen 进行类型检查后产生(详情)。 |
miri | miri | 一个 MIR 解释器,用于常量求值(详情) |
正规化 | normalize | 常用术语,表示转换为更典范的形式,不过在 rustc 中一般指关联类型的正规化。 |
新类型 | newtype | 「新类型」是对其它类型的包装(例如,struct Foo(T) 就是 T 的「新类型」)。在 Rust 中它通常用于为索引提供一种更强的类型。 |
NLL | NLL | 非词法生命周期(non-lexical lifetimes),Rust 借用系统的一个扩展,使其基于控制流图。 |
node-id 或 NodeId | node-id or NodeId | 用于在 AST 或 HIR 中标识特定节点的索引,之后会被逐步淘汰并以 HirId 代之。 |
(证明)义务 | obligation | 必须被特质系统证明的东西(详情)。 |
投影 | projection | 常用术语,表示「相对路径」,例如 x.f 是一种「字段投影」,而 T::Item 是一种「关联类型投影」。 |
被提升常量 | promoted constants | 从函数中提取并提升到静态作用域的常量。更多详情见此节。 |
提供者 | provider | 用于执行查询的函数(详情)。 |
量化 | quantified | 在数学和逻辑学中,存在量化和全称量化用于提出像「是否存在类型 T 使其为真?」或「对于所有的类型 T 它是否都为真?」这样的问题。详见背景材料。 |
查询 | query | 在编译器可能存在的子计算过程(详情)。 |
生存区 | region | 生命周期的另一个术语,通常在书面用于或借用检查器中出现。 |
rib2 | rib | 名字求解器中的一种数据结构,用于跟踪多个名字所处的同一作用域(详情)。 |
sess | sess | 编译器会话(session),用于存储整个编译期间使用的全局数据。 |
副表 | side tables | 由于 AST 和 HIR 一旦创建即不可变,因此我们通常以散列表的形式保存关于它们的额外的信息,并以特定节点的 id 来索引。 |
符文 | sigil | 类似于关键字,但完全由非字母或数字的标记组成。例如,& 是一个表示引用的符文。 |
占位符 | placeholder | 注意:skolemization 已被 placeholder 取代。 一种处理「for-all」类型(例如 for<'a> fn(&'a u32) )的子定型(subtyping)以及解决高阶特质界定(HRTB,higher-ranked trait bounds,例如 for<'a> T: Trait<'a> )的方式。详见占位符与全域一节。 |
可靠性 | soundness | 可靠性是类型论中的技术术语。大致来说,如果一个类型系统是可靠的,且一个程序可通过类型检查,那么它就是类型安全的。例如,在 Safe Rust 中,我们无法将一个值强制赋予一个类型不匹配的变量(见「完备性」)。 |
(代码)区段 | span | 用户源码中的位置,主要用于错误报告。它是个形如「文件名/行/列」的元组,保存了起点/终点,也用于跟踪宏展开和编译脱糖(desugar)。所有这些都被打包成几个字节(实际上,它只是个表的索引)。更多信息见 Span 数据类型。 |
substs | substs | 对给定泛型类型或元项的代换(例如 HashMap<i32, u32> 中的 i32 和 u32 )。 |
tcx | tcx | 即类型上下文(typing context),编译器的主要数据结构(详情)。 |
'tcx | 'tcx | 当前正在活动的推导上下文的生命周期(详情)。 |
特质引用 | trait reference | 带有合适的输入类型/生命周期集合的特质的名字(详情)。 |
(词法)标记 | token | 语法解析的最小单元。词法标记是分词后的产物(详情)。 |
TLS | TLS | 线程局部的存储(Thread-Local Storage)。定义的变量在每个线程中都有各自的一份副本(而非所有线程共享一份副本)。它会与 LLVM 有一些交互。不是所有平台都支持 TLS。 |
trans | trans | 将 MIR 翻译为 LLVM IR 的代码。重命名为 codegen。 |
特质引用 | trait reference | 特质及其类型形参的值(详情)。 |
ty | ty | 类型(type)的内部表示(详情)。 |
UFCS | UFCS | 全域函数调用(Universal Function Call Syntax)。一种无歧义的调用方法的语法(详情)。 |
不可居留类型 | uninhabited type | 一种 无值 的类型。它与 ZST 不同,ZST 有刚好一个值。不可居留类型的一个例子是 enum Foo {} ,它没有任何变体,因此无法被创建。编译器会将不可居留类型的的代码视作死代码,因此没有这样的值可被处理。! (即 never 类型)是一个不可居留类型。不可居留类型也叫做「空类型」。 |
upvar | upvar | 由闭包外部的闭包捕获的变量。 |
型变 | variance | 形变决定了泛型形参/生命周期形参的改变如何影响其子定型;例如,若 T 是 U 的子类型,则 Vec<T> 是 Vec<U> 的子类型,因为 Vec 对其泛型形参是 协变(covariant) 的。更一般的解释见背景材料。型变一章阐述了类型检查是如何处理型变的。 |
宽指针 | Wide pointer | 带有附加数据的指针。详见「胖指针」。 |
ZST | ZST | 零大小类型(Zero-Sized Type)。值的大小为 0 字节的类型。由于 2^0 = 1 ,因此这种类型只能有刚好一个值。例如,() (单元类型)就是一个 ZST。struct Foo; 也是一个 ZST。编程器可对 ZST 做出良好的优化。 |
译注:
这里用了个双关。Drop
即 Rust 中的析构器 trait,用 drop()
方法来清理掉脱离作用域的值;
glue 即胶水代码,用来粘合或填平不同的操作。drop glue 本意是滴胶,而 drop 又有删除,去掉的意思。
到这里则引申出了,它是编译器生成的,用来做清理(drop)的胶水(glue)指令的意思。
rib 本意是胸腔的肋骨,形状是从上往下逐渐增大,很像作用域由内向外层层扩大。 然而不同于环状的是,它还有一个高度。作用域有个特点,就是即便一个作用域在另一个作用域之内, 那么内部作用域里的变量也并不总是能访问外部作用的变量,有可能内部有个同名的变量把外部的屏蔽了。 解决方案是,引入一个高度,让它变成个自上而下逐级变大的「栈」,并规定同级的内层可访问外层。 如果内部能直接访问外部变量,那么它们在同一级上,作用域只分内外层; 否则,就把内部的变量升高一级。在中文语境中,把它比作等高线或者汉诺塔似乎更加贴切。
附录 D:代码索引
rustc 中有大量重要的数据结构。本附录列出了一些编译器中关键的数据结构,给出了了解其细节的途径。
条目 | 种类 | 简述 | 章节 | 声明 |
---|---|---|---|---|
BodyId | struct | 四种 HIR 节点标识符类型之一 | HIR 中的标识符 | src/librustc/hir/mod.rs |
Compiler | struct | 表示编译器会话,可用于驱动编译。 | rustc 驱动与接口 | src/librustc_interface/interface.rs |
ast::Crate | struct | 经解析的 crate 的语法层面的表示 | 解析器 | src/librustc/hir/mod.rs |
hir::Crate | struct | 更加抽象的,编译器友好的 crate 的 AST 形式 | HIR | src/librustc/hir/mod.rs |
DefId | struct | 四种 HIR 节点标识符类型之一 | HIR 中的标识符 | src/librustc/hir/def_id.rs |
DiagnosticBuilder | struct | 用于构建编译器诊断,如错误或 lint 的结构体 | 触发诊断 | src/librustc_errors/diagnostic_builder.rs |
DocContext | struct | rustdoc 在收集 crate 的文档时所用到的状态容器 | Rustdoc | src/librustdoc/core.rs |
HirId | struct | 四种 HIR 节点标识符类型之一 | HIR 中的标识符 | src/librustc/hir/mod.rs |
NodeId | struct | 四种 HIR 节点标识符类型之一,正在逐步淘汰 | HIR 中的标识符 | src/libsyntax/ast.rs |
P | struct | 被占有的不可变智能指针。对比来说,&T 未被占有,而 Box<T> 是不可变的 | 无 | src/syntax/ptr.rs |
ParamEnv | struct | 泛型形参或 Self 的相关信息,在处理关联项或泛型项时使用 | 形参环境 | src/librustc/ty/mod.rs |
ParseSess | struct | 此结构体包含有关解析器会话的信息 | 解析器 | src/libsyntax/parse/mod.rs |
Query | struct | 表示 Compiler 接口的查询结果,允许偷用、借用和返回编译器每趟的结果 | rustc 驱动与接口 | src/librustc_interface/queries.rs |
Rib | struct | 表示名字的单个作用域 | 名字求解 | src/librustc_resolve/lib.rs |
Session | struct | 于编译回话关联的数据 | 解析器,rustc 驱动与接口 | src/librustc/session/mod.html |
SourceFile | struct | SourceMap 的部分。将 AST 节点映射到它们的单个源文件的源码。曾名为 FileMap | 解析器 | src/libsyntax_pos/lib.rs |
SourceMap | struct | 将 AST 节点映射到它们的源码。它由 SourceFile 组成。曾名为 CodeMap | 解析器 | src/libsyntax/source_map.rs |
Span | struct | 用户源码中的位置,主要用于错误报告 | 触发诊断 | src/libsyntax_pos/span_encoding.rs |
StringReader | struct | 解析过程中使用的词法分析器。它从待编译的原始源码中读入字符,产生一系列词法标记以供解析器使用 | 解析器 | src/libsyntax/parse/lexer/mod.rs |
syntax::token_stream::TokenStream | struct | 抽象的词法标记序列,被组织为 TokenTree | 解析器,宏展开 | src/libsyntax/tokenstream.rs |
TraitDef | struct | 此结构体包含特质的定义及其类型信息 | ty 模块 | src/librustc/ty/trait_def.rs |
TraitRef | struct | 特质及其输入类型的组合(例如 P0: Trait<P1...Pn> ) | 特质求解:目标与子句,特质求解:底层实现 | src/librustc/ty/sty.rs |
Ty<'tcx> | struct | 类型的内部表示,用于类型检查 | 类型检查 | src/librustc/ty/mod.rs |
TyCtxt<'cx, 'tcx, 'tcx> | struct | 「定型上下文(typing context)」。它是编译器中的核心数据结构。它是用于执行各种查询的上下文 | ty 模块 | src/librustc/ty/context.rs |
Ignore me
This file is a collection of links that are not link-checked by anyone else, but we want them to work. For example, the stabilization guide link is often posted to tracking issues on GitHub, which might break later if the guide is changed.