This article needs a technical review. How you can help.
The schema used in graph.sqlite
is defined by schema.sql
. Since Callgraph is new, the schema is fairly simple and is expected to evolve. If you have suggestions or use cases that require changes, please file a bug. (In particular, there are several pending improvements mentioned below.)
The schema contains the following tables:
node
. This represents a function, and contains the following fields:id INTEGER PRIMARY KEY
, a unique identifier for the nodename TEXT
, the fully-qualified function or method name, including return type and parameters. See below.isPtr INTEGER
, 1 if the function call (in this case, the callee) is a function pointer, 0 otherwise.isVirtual INTEGER
, 1 if the function call is a virtual method, 0 otherwise.loc TEXT
, the fully-resolved source file containing the declaration or definition of the function. See below.UNIQUE (name, loc) ON CONFLICT IGNORE
The fully-qualified method name includes the return value, namespaces, class name, method or function name, and parameter types. For instance, the code
namespace ns { class cs { int method(float, double); }; };
would result in a
name
ofint ns::cs::method(float,double)
. For nonstatic methods,this
is excluded from the parameter list, though we may want to denote nonstatic methods with a separate field. The global namespace is indicated by a leading::
. (Types are currently not fully-qualified.) Theloc
refers to the source file in which the declaration or definition appeared. For classes, the declaration context will always be available, and this location will refer to the file containing the declaration (for instance,foo.h
). For function calls, this location may refer to the declaration or definition, and cannot presently be used in concert withname
as a reliably unique identifier. (This could potentially be solved by implementing linker-style name resolution rules in Callgraph.) For function pointers, it is impossible to determine a location, andloc
will be the empty string. (You should guard on this case usingisPtr
.) For compiler built-ins,loc
will be<built-in>
. Since symlinks are prevalent in the Mozilla build process, the path inloc
is always fully-resolved; that is, it will be an absolute path which is guaranteed to have a one-to-one mapping with a particular file in the source or object tree. (In the case of generated files, such as from xpidlgen, it will point into the object tree. Note also that one can override gcc's notion of the current source file using the#file
directive; in the likely case where this file doesn't exist, the resultingloc
will not be fully-resolved. Caveat emptor. Thankfully, this practice is uncommon.)The
(name, loc)
pair is intended to be a unique identifier for a given function or method call. (Though this currently breaks down for functions and function pointers.) In the future, we may include amangledName
field in the table, which would allow more consistency with linker rules regarding function name resolution and uniqueness. We may also add a field indicating whether the function definition has been seen (which would distinguish, say, calls into library functions from functions who simply don't call anyone else).edge
. This represents a call between functions, and contains the following fields:caller INTEGER REFERENCES node
, theid
of anode
representing the caller function.callee INTEGER REFERENCES node
, likewise for the callee.PRIMARY KEY(caller, callee) ON CONFLICT IGNORE
The primary key is currently unique on caller and callee, meaning that if the caller calls the callee multiple times, only one edge will exist in the table. We may change this if there are cases where the number of calls is relevant.
implementors
. This table provides information about the inheritance chain of C++ classes, specifically which virtual interface methods are overridden or implemented by which classes.implementor TEXT
, the fully-qualified class name of the class which implements the method.interface TEXT
, the fully-qualified class name of the class which declares the method to be pure virtual. See below.method TEXT
, the method name (not including return type or arguments).loc TEXT
, the fully-resolved source file containing the declaration of the interface class.id INTEGER PRIMARY KEY
, a unique identifier for the entry.UNIQUE (implementor, interface, method, loc) ON CONFLICT IGNORE
An interface is considered a class which declares the method in question to be virtual (either pure or non-pure), while an implementor overrides it with a declaration that is non-pure. (Whether the implementor actually defines the function is not considered.) For instance, consider the code
class iFoo { public: virtual void method1(float) = 0; virtual void method2(float) = 0; }; class iBar : public iFoo { public: virtual void method1(float) = 0; virtual void method2(float); }; class Bar : public iBar { public: virtual void method1(float); virtual void method2(float); };
The class
iBar
would be considered an implementor ofmethod2
, while classBar
would be an implementor of bothmethod1
andmethod2
. For implementoriBar
,iFoo::method2
would be considered an interface method. For implementorBar
, both methods oniFoo
andiBar
would be listed as interface methods. Uniqueness is determined by implementor, interface, method, and loc. (Note that the parameter list to the virtual method is also required to determine uniqueness, and will be added.)