Declarative linting framework over the Clang AST.
***DEPRECATED*** On end-of-life support, may be removed in the future.
- C/C++/ObjC: Yes
- Java: No
- C#/.Net: No
For C/C++ and Objective-C languages, we provide a linters framework. These are checks about the syntax of the program; it could be about a property, or about code inside one method, or that a class or method have certain properties. We provide a few checks by default and we have developed a domain specific language (DSL) to make it easier to write checks.
One of the major advantage of Infer when compared with other static analyzers is the fact it performs sophisticated inter-procedural/inter-file analysis. That is, Infer can detect bugs which involve tracking values through many procedure calls and the procedures may live in different files. These may be very subtle bugs and designing static analyses to do that is quite involved and normally requires deep static analysis expertise.
However, there are many important software bugs that are confined in the code of a single procedure (called intra-procedural). To detect these bugs simpler analyses may suffice which do not require deep technical expertise in static analysis. Often these bugs can be expressed by referring to the syntax of the program, or the types of certain expressions. We have defined a new language to easily design checkers which identify these kind of bugs. The language is called AL (AST Language) and its main feature is the ability to reason about the Abstract Syntax Tree of a program in a concise declarative way. AL's checkers are interpreted by Infer to analyze programs. Thus, to detect new kind of bugs in Infer one can just write a check in AL. We will see in more detail later, that for writing AL formulas we also need predicates: simple functions that check a property of the AST. Predicates are written in OCaml inside Infer, thus it requires a bit of OCaml knowledge and getting familiar with the OCaml data structure for the clang AST.
When you write a linter that traverses the AST of some programs to check some property, you probably need to understand what the AST looks like. You can get the AST of programs using clang directly, or using Infer.
If you have a clang command
clang <clang arguments> File.m then you can get
the AST with
You can also get the AST using Infer. One advantage of this is that you don't need to know the speicifc clang command, just the general build command. Moreover, what you get here is exactly the form of the AST that Infer has as input.
For this you need to install an OCaml package
opam install biniou. See the opam website for
instructions on how to install opam.
Then, the AST can be created by Infer in debug mode. Call Infer with
This will, among other things, generate a file
/path/to/File.m that is being analyzed. Run this script with
bash File.m.ast.sh and a file
/path/to/File.m.ast.bdump will be generated,
that contains the AST of the program in
bdump format (similar to json). If you
get an error about
bdump not being found you may need to run
eval $(opam env) to get the
bdump executable (provided by the biniou opam
package) into your
For general info on the clang AST, you can check out clang's website.
Let's start with an example. Suppose we want to write the following Objective-C's linter:
"a property containing the word 'delegate', but not containing the word 'queue' should not be declared strong".
We can write this property in the following way:
The linter definition starts with the keyword
DEFINE-CHECKER followed by the
checker's name. The first
LET clause defines the formula variable
name_contains_delegate using the predicate
declaration_has_name which return
true/false depending whether the property's name contains a word in the language
of the regular expression
[dD]elegate. In general a predicate is a simple
atomic formula evaluated on an AST node. The list of available predicates is in
(this list is continuously growing and if you need a new predicate you can add
it in ocaml). Formula variables can be used to simplify other definitions. The
SET report_when is mandatory and defines a formula that, when evaluates to
true, will tell Infer to report an error. In the case above, the formula is
saying that we should report when visiting an
ObjCPropertyDecl (that is the
AST node declaring a property in Objective-C) where it holds that: the name
contains "delegate/Delegate" (
name_contains_delegate) and the name doesn't
contain "queue/Queue" (
name_does_not_contain_queue) and the node is defining a
"strong" property (
SET message clause defines the error message that will be displayed to the
user. Notice that the message can include placeholders like
Placeholders are evaluated by Infer and substituted by their current value when
the error message is reported. In this case the name of the declaration. The
SET suggestion clause define an optional hint to give to programmer on how to
fix the problem.
The general structure of a checker is the following:
The default severity is
WARNING and the default mode is
ON, so these are
optional. If the check is
OFF it will only be available in debug mode (flags
INFOs are generally also not
reported, except with some specialzed flags.
doc_url are used only
for CI comments at the moment (in Phabricator).
blacklist_path are optional, by default the rule is
enabled everywhere. For specifying paths, one can use either string constants
"File.m") or regexes (
REGEXP("path/to/.*")) or variables. The variables
stand for a list of paths, and are defined in a separate block:
It is possible to define macros that can be used in several checkers. This is done in the following way:
GLOBAL-MACROS is the section of an AL specification where one can define a
list of global macros. In the example we are defining the macro
which can now be used in checkers instead of its complex definition.
It is possible to import a library of macros and paths with the following command:
In an AL file, the command above import and make available all the macros and
paths defined in the
The simplest formulas we can write are predicates. They are defined inside Infer. We provide a library, but if the predicate that you require is not available, you will need to extend the library. Here are the some of the currently defined predicates:
In general, the parameters of predicates can be constants, or variables, or
regular expressions. Variables are used in macros, see below. The syntax for
using regexes is
NOTE: The predicates that expect types, such as
objc_method_has_nth_parameter_of_type also accept
regexes, but the syntax is a bit different:
this regex can be embedded inside another string, for example:
has_type("REGEXP('NS.+')*" ) which stands for pointer to a class of name
starting with NS.
Formulas are defined using a variation of the CTL temporal logic. CTL is a logic expressing properties of a tree model. In the case of AL, the tree is the AST of the program. Formulas are defined according to the following grammar:
The first four cases (
IMPLIES) are classic boolean
operators with the usual semantics. The others are temporal operators describing
how the truth-value of a formula is evaluated in a tree. Let's consider case by
|F1 HOLDS-UNTIL F2||from the current node, there exists a path where F1 holds at every node until F2 becomes true|
An example is depicted in the following tree. When
F2 hold in a node
this is indicated between square brackets. The formula
F1 HOLDS-UNTIL F2 holds
in the green nodes.
|F HOLDS-EVENTUALLY||from the current node there exists a path where at some point F becomes true|
In the picture below, as
F holds in
F HOLDS-EVENTUALLY holds in
the green nodes
n10. This is because from these nodes there is a
F holds. Note that it holds for
n10 as well
because there exists a trivial path of length 0 from
n1 to itself.
|F HOLDS-EVERYWHERE-EVENTUALLY||in every path starting from the current node at some point F becomes true|
For example, in the tree below, the formula holds in every green node because every paths starting from each of them eventually reaches a node where F holds.
|F HOLDS-NEXT||from the current node (we are visiting) there exists a child where F is true|
In the tree below, the formula
F HOLDS-NEXT it is true only in n1 as it's the
only node with a child where
F holds (node n3). In AL,
NEXT is synonym of
child as, in terms of a path in the tree, a child is the next node.
|F HOLDS-EVERYWHERE-NEXT||from the current node in every existing child F is true|
In the tree below, the formula
F HOLDS-EVERYWHERE-NEXT it is true in n1 as
it's the only node for which in every child
F holds (node n2, n3, and n7).
|F HOLDS-ALWAYS||from the current node there exists a path where F holds at every node|
In the tree below
F HOLDS-ALWAYS holds in
n8 because for each of
these nodes there exists a path where
F holds at each node in the path.
|F HOLDS-EVERYWHERE-ALWAYS||from the current node, in every path F holds at every node|
F HOLDS-EVERYWHERE-ALWAYS holds in
n8 because when we
visit those nodes in every path that start from them
F holds in every node.
|WHEN F HOLDS-IN-NODE node1,…,nodeK||we are in a node among node1,…,nodeK and F holds|
WHEN F HOLDS-IN-NODE
n6 holds only in node
n2 as it is the
only node in the list
n6 where F holds.
Let's consider an example of checker using formula
WHEN F HOLDS-IN-NODE node1,…,nodeK for checking that a property with pointer
type should not be declared "assign":
The checker uses two predefined predicates
is_property_pointer_type() which are true if the property being declared is
assign and has a pointer type respectively. We want to check both conditions
only on nodes declaring properties, i.e.,
|IN-NODE node1,…, nodeK WITH-TRANSITION t F HOLDS-EVENTUALLY||from the current node there exists a path which eventually reaches a node among “node1,…,nodeK” with a transition t reaching a child where F holds|
The following tree explain the concept:
The concept of transition is needed because of the special structure of the clang AST. Certain kind of nodes, for example statements, have a list of children that are statements as well. In this case there is no special tag attached to the edge between the node and the children. Other nodes have records, where some of the fields point to other nodes. For example a node representing a function declaration will have a record where one of the fields is body. This is pointing to a statement representing the function's body. For records, sometimes we need to specify that we need a particular node reachable via a particular field (i.e., a transition).
Hint A good way to learn how to write checkers is looking at existing checkers in the file linters.al.
In the following we show a few examples of simple checks you may wish to write and the corresponding formulas:
- A check for flagging a Objective-C class that inherits from a class that shouldn't be subclassed.
- A check for flagging an Objective-C instance method call:
- A check for flagging an Objective-C instance method call of any method of a class:
- A check for flagging an Objective-C class method call:
- A check for flagging an Objective-C method call of a method with int return type:
- A check for flagging a variable declaration with type long
- A check for flagging a method that has a parameter of type A*
- A check for flagging a method that has all the parameters of type A* (and at least one)
- A check for flagging a method that has the 2nd parameter of type A*
- A check for flagging a protocol that inherits from a given protocol.
HOLDS-EVENTUALLY WITH-TRANSITION Protocolmeans follow the
Protocolbranch in the AST until the condition holds.
- A check for flagging when a constructor is defined with a parameter of a type
that implements a given protocol (or that inherits from it).
HOLDS-NEXT WITH-TRANSITION Parametersmeans, starting in the
ObjCMethodDeclnode, follow the
Parametersbranch in the AST and check that the condition holds there.
- A check for flagging a variable declaration of type NSArray applied to A.
- A check for flagging using a property or variable that is not available in the
supported API. decl_unavailable_in_supported_ios_sdk is a predicate that works
on a declaration, checks the available attribute from the declaration and
compares it with the supported iOS SDK. Notice that we flag the occurrence of
the variable or property, but the attribute is in the declaration, so we need
PointerToDeclthat follows the pointer from the usage to the declaration.
- A check for flagging using a given namespace
- A check for flagging the use of given enum constants
When you write the message of your rule, you may want to specify which
particular AST items were involved in the issue, such as a type or a variable
name. We have a mechanism for that, we specified a few placeholders that can be
used in rules with the syntax
%placeholder% and it will be substituted by the
correct AST info. At the moment we have
that print the type of the node, the type of the node's child, and a string
representation of the node, respectively. As with predicates, we can add more as
To test your rule you need to run it with Infer. If you are adding a new linter
you can test it in a separate al file that you can pass to Infer with the option
--linters-def-file file.al. Pass the option
--linters-developer-mode --linter <LINTER_NAME> to Infer to print debug
information and only run the linter you are developing, so it will be faster and
the debug info will be only about your linter.
To test your code, write a small example that triggers the rule. Then, run your code with
the bug should be printed in the screen, like, for instance:
Moreover, the bug can be found in the file
infer-out is the results directory where Infer operates, that is created in
the current directory. You can specify a different directory with the option
If there are syntax errors or other parsing errors with your al file, you will
get an error message when testing the rule, remember to use
linters-developer-mode when you are developing a rule. If the rule gets parsed
but still doesn't behave as you expect, you can debug it, by adding the
following line to a test source file in the line where you want to debug the
//INFER_BREAKPOINT. Then run infer again in linters developer mode, and
it will stop the execution of the linter on the line of the breakpoint. Then you
can follow the execution step by step. It shows the current formula that is
being evaluated, and the current part of the AST that is being checked. A red
node means that the formula failed, a green node means that it succeeded.
The linters are run by default when you run Infer. However, there is a way of
running only the linters, which is faster than also running Infer. This is by
adding the option
--linters to the analysis command as in this example:
There are a few other command-line options that are useful for using or
developing new linters in Infer. Read about them in the
infer capture manual.
The following issue types are reported by this checker: