Introduction
Treehydra is meant to be used for analyses that need more detail than Dehydra's flattened ASTs. Instead of representing code in "easy" form like Dehydra, Treehydra relies on GIMPLE, the GCC Internals "middle-end" intermediate representation. Treehydra simply reflects the GCC Internals structures into JavaScript.
For now, the best way to learn GIMPLE is to look at existing code as there is little newbie documentation to be had.
Callbacks and GCC Intermediate Representations
Like Dehydra, Treehydra sends program representations to the user JavaScript via callbacks. There are 3 Treehydra callbacks. Note that the callbacks interact with other Treehydra options. To make things easier, we will first present recipes for accessing specific GCC intermediate representations.
Treehydra provides the following callbacks to analysis scripts: process_tree_decl, process_tree, process_tree_type, and process_cp_pre_genericize. FIXME document all these callbacks.
C++ ASTs
C++ ASTs are provided via the process_cp_pre_genericize callback. This is called just before GCC converts the C++-specific AST to the GENERIC (language-independent) representation. Example Treehydra script:
function process_cp_pre_genericize(fn) {
// fn is a FUNCTION_DECL, DECL_SAVED_TREE(fn) is AST
print(fn);
}
GIMPLE ASTs
GIMPLE ASTs are provided via the process_tree callback. By default, this is called just after GCC converts the AST to GIMPLE, before any further lowering or optimization. Example:
function process_tree(fn) {
// fn is a FUNCTION_DECL, DECL_SAVED_TREE(fn) is AST
print(fn);
}
GIMPLE CFGs
process_tree can also provide GIMPLE CFGs. To get CFGs, you must instruct Treehydra to call process_tree after CFG construction. In this case, GIMPLE ASTs will not be available, as GCC does not save earlier intermediate representations--they are modified in place. Thus, it is not currently possible to get both GIMPLE ASTs and CFGs in the same run of Treehydra.
require({ after_gcc_pass: "cfg" });
include('gcc_util.js'); // for function_decl_cfg
include('gcc_print.js');
function process_tree(fn) {
print("function " + decl_name(fn)); // fn is a FUNCTION_DECL
let cfg = function_decl_cfg(fn);
for (let bb in cfg_bb_iterator(cfg)) {
print(" basic block " + bb_label(cfg, bb));
for (let isn in bb_isn_iterator(bb)) {
print(" " + isn_display(isn));
}
}
}
Post-Inlining GIMPLE CFGs
Treehydra can be positioned after any pass, allowing you access various levels of GIMPLE. In this example, we show how to access the GIMPLE just after function inlining is performed. (This allows a simple version of interprocedural analysis.) At this point, GCC is using a different internal structure, so Treehydra uses a different callback, process_cgraph.
require({ after_gcc_pass: "einline" + (isGCC42 ? "" : "_ipa")});
include('gcc_util.js'); // for function_decl_cfg
function process_cgraph(cgraph) {
// cgraph is a GCC structure representing a group of functions
// within the call graph. Iterate over the functions like this.
for (let node = cgraph; node; node = node.next) {
let fn = node.decl;
if (DECL_STRUCT_FUNCTION(fn)) { // fn has a body
print(fn);
let cfg = function_decl_cfg(fn);
}
}
GIMPLE Reference
For a detailed description of GIMPLE see gcc/tree.def and gcc/cp/operators.def
See also treehydra.js, gcc_compat.js, gcc_util.js, and gcc_print.js in the Treehydra libs directory, which have many ports of GCC macros and other functions for conveniently accessing GIMPLE data in JavaScript.
ESP Abstract Interpretation Library
Treehydra includes a library for creating abstract interpretation analyses based on ESP. Users only need to define their ESP property variables, abstract values, and the flow semantics of GIMPLE statements, and the library will do all of the substate tracking and fixed-point solving.
The library is at unstable/esp.js. There is a sample application that checks for double-locking in a simple mutex API in test/esp_lock.js.
Abstract interpretation is a big topic, which is not yet fully documented here. For now, here is an explanation of some of the terminology that is often confusing:
concrete value. A value in the programming language being analyzed. E.g., (int) 3.
abstract value. A value used in the analysis script that represents a set of concrete values. E.g., nonzero. In ESP, these are user-defined.
concrete state. A memory state in the programming language being analyzed. In other words, a mapping from program variables to concrete values.
abstract state. Conceptually, this is an object used in the analysis that represents a set of concrete states. In the ESP library, these are ESP.State objects.
bottom. This is a special abstract state the represents the empty set of concrete states. If the state is bottom at a certain program point, that means that program point is unreachable. In ESP, this is ESP.NOT_REACHED.
top. This is a special abstract state that represents all possible states. top is also used to name a special abstract value that represents all possible concrete values. Thus, top means the analysis has no information about that value or state.
meet. Meet is a relation on abstract values used to filter states at conditional statements. For example, if the current abstract value of a variable x is nonzero, and the analysis goes inside an if statement with the condition x == 1, then in the new abstract value of x will be nonzero meet 1. ESP uses a user-defined meet function supplied to the Analysis constructor that should return the intersection of two abstract values. It is also acceptable for meet to return a value that contains the intersection--this may cause the analysis to have more false positives but will not introduce false negatives.
join. Join is a relation on abstract values used to merge states at control flow merge points. For example, if ESP reaches a point where two control flow paths merge, and on one side the abstract value of x is 1, and on the other side, the abstract value of x is 2, then the new abstract value of x will be 1 join 2. ESP uses a built-in join function where join(a,a) -> a and join(a,b) -> top for a != b.