Schema Reference

In This Article

This article needs a technical review. How you can help.

The schema used in graph.sqlite is defined by schema.sql. Since Callgraph is new, the schema is fairly simple and is expected to evolve. If you have suggestions or use cases that require changes, please file a bug. (In particular, there are several pending improvements mentioned below.)

The schema contains the following tables:

node. This represents a function, and contains the following fields:
- id INTEGER PRIMARY KEY, a unique identifier for the node
- name TEXT, the fully-qualified function or method name, including return type and parameters. See below.
- isPtr INTEGER, 1 if the function call (in this case, the callee) is a function pointer, 0 otherwise.
- isVirtual INTEGER, 1 if the function call is a virtual method, 0 otherwise.
- loc TEXT, the fully-resolved source file containing the declaration or definition of the function. See below.
- UNIQUE (name, loc) ON CONFLICT IGNORE
The fully-qualified method name includes the return value, namespaces, class name, method or function name, and parameter types. For instance, the code
```
namespace ns {
  class cs {
    int method(float, double);
  };
};
    
```
would result in a name of int ns::cs::method(float,double). For nonstatic methods, this is excluded from the parameter list, though we may want to denote nonstatic methods with a separate field. The global namespace is indicated by a leading ::. (Types are currently not fully-qualified.) The loc refers to the source file in which the declaration or definition appeared. For classes, the declaration context will always be available, and this location will refer to the file containing the declaration (for instance, foo.h). For function calls, this location may refer to the declaration or definition, and cannot presently be used in concert with name as a reliably unique identifier. (This could potentially be solved by implementing linker-style name resolution rules in Callgraph.) For function pointers, it is impossible to determine a location, and loc will be the empty string. (You should guard on this case using isPtr.) For compiler built-ins, loc will be <built-in>. Since symlinks are prevalent in the Mozilla build process, the path in loc is always fully-resolved; that is, it will be an absolute path which is guaranteed to have a one-to-one mapping with a particular file in the source or object tree. (In the case of generated files, such as from xpidlgen, it will point into the object tree. Note also that one can override gcc's notion of the current source file using the #file directive; in the likely case where this file doesn't exist, the resulting loc will not be fully-resolved. Caveat emptor. Thankfully, this practice is uncommon.)

The (name, loc) pair is intended to be a unique identifier for a given function or method call. (Though this currently breaks down for functions and function pointers.) In the future, we may include a mangledName field in the table, which would allow more consistency with linker rules regarding function name resolution and uniqueness. We may also add a field indicating whether the function definition has been seen (which would distinguish, say, calls into library functions from functions who simply don't call anyone else).
edge. This represents a call between functions, and contains the following fields:
- caller INTEGER REFERENCES node, the id of a node representing the caller function.
- callee INTEGER REFERENCES node, likewise for the callee.
- PRIMARY KEY(caller, callee) ON CONFLICT IGNORE
The primary key is currently unique on caller and callee, meaning that if the caller calls the callee multiple times, only one edge will exist in the table. We may change this if there are cases where the number of calls is relevant.
implementors. This table provides information about the inheritance chain of C++ classes, specifically which virtual interface methods are overridden or implemented by which classes.
- implementor TEXT, the fully-qualified class name of the class which implements the method.
- interface TEXT, the fully-qualified class name of the class which declares the method to be pure virtual. See below.
- method TEXT, the method name (not including return type or arguments).
- loc TEXT, the fully-resolved source file containing the declaration of the interface class.
- id INTEGER PRIMARY KEY, a unique identifier for the entry.
- UNIQUE (implementor, interface, method, loc) ON CONFLICT IGNORE
An interface is considered a class which declares the method in question to be virtual (either pure or non-pure), while an implementor overrides it with a declaration that is non-pure. (Whether the implementor actually defines the function is not considered.) For instance, consider the code
```
class iFoo {
  public:
    virtual void method1(float) = 0;
    virtual void method2(float) = 0;
};

class iBar : public iFoo {
  public:
    virtual void method1(float) = 0;
    virtual void method2(float);
};

class Bar : public iBar {
  public:
    virtual void method1(float);
    virtual void method2(float);
};
```
The class iBar would be considered an implementor of method2, while class Bar would be an implementor of both method1 and method2. For implementor iBar, iFoo::method2 would be considered an interface method. For implementor Bar, both methods on iFoo and iBar would be listed as interface methods. Uniqueness is determined by implementor, interface, method, and loc. (Note that the parameter list to the virtual method is also required to determine uniqueness, and will be added.)

Document Tags and Contributors

Tags:

Contributors to this page: Sheppy, kscarfone, Dwitte

Last updated by: Sheppy, Apr 16, 2014, 12:20:47 PM