Status of this document
This is a partial reverse-engineering of the libIDL source code's parser, limited mostly to the subset of functionality that is supported by the Mozilla xpidl binary.
Purpose of this document
This document is not an introduction to XPIDL or IDL in general. It is more focused on XPIDL syntax and grammar. See XPIDL Main Page for more links and introductory content.
Simplifications, conventions and notation
The syntax is specified according to ABNF as defined by RFC 5234, although a few productions use prose for clarity of understanding.
Lexically, tokens are delimited by whitespace (defined here as spaces, tabs, vertical tabs, form feeds, line feeds, and carriage returns, or [ \t\v\f\r\n] in regular expression form). LibIDL only considers a single line feed as a newline, and not carriage returns (although xpidl begs to differ). Additionally, the use of both C-style (/* ... */) and C++-style (// ... end-of-line) comments are permitted between any two tokens.
Some productions can only occur at the beginning of lines; to simplify the grammar, I will not mention them in the grammar, especially since they are handled as a preprocessing step before the IDL source code is actually parsed.
- A `
%{
' that appears at the beginning of a line is the start of a raw code fragment, which extends until the end of a line that begins with `%}
'. Text inside raw code fragments are not otherwise parsed by xpidl directly. This may be followed by the language, as in `%{C++
', to output the raw fragment only in the specified language. - A `
#include "file"
' line instructs the xpidl processor to include that file in the same sense that the C preprocessor includes a file. Note that includes within comments or raw code fragments are not processed by xpidl. Unlike the C preprocessor, when a file is included multiple times, it acts as if the subsequent includes did not happen; this prevents the need for include guards.
XPIDL Syntax (ABNF)
The root production here is idl_file
.
idl_file = 1*definition
definition = [type_decl / const_decl / interface] ";"
interface = [prop_list] "interface" ident [[inheritance] "{" *(ifacebody) "}"]
inheritance = ":" *(scoped_name ",") scoped_name]
ifacebody = [type_decl / op_decl /attr_decl / const_decl] ";" / codefrag
type_decl = [prop_list] "typedef" type_spec *(ident ",") ident
type_decl /= [prop_list] "native" ident [parens]
const_decl = "const" type_spec ident "=" expr
op_decl = [prop_list] (type_spec / "void") parameter_decls raise_list
parameter_decls = "(" [*(param_decl ",") param_decl] ")"
param_decl = [prop_list] ("in" / "out" / "inout") type_spec ident
attr_decl = [prop_list] ["readonly"] "attribute" type_spec *(ident ",") ident
; Descending order of precedence
expr /= expr ("|" / "^" / "&") expr ; Unequal precedence "|" is lowest
expr /= expr ("<<" / ">>") expr
expr /= expr ("+" / "-") expr
expr /= expr ("*" / "/" / "%") expr
expr /= ["-" / "+" / "~"] (scoped_name / literal / "(" expr ")" )
; Numeric literals: quite frankly, I'm sure you know how these kinds of
; literals work, and these are annoying to specify in ABNF.
literal = octal_literal / decimal_literal / hex_literal / floating_literal
literal /= string_literal / char_literal
literal /= "TRUE" / "FALSE"
; In regex: /"[^"\n]*["\n]/. Yes, newline terminates.
string_literal = 1*(%x22 *(any char except %x22 or %x0a) (%x22 / %x0a))
; Same as above, but s/"/'/g
char_literal = 1*(%x27 *(any char except %x27 or %x0a) (%x27 / %x0a))
type_spec = "float" / "double" / "string" / "wstring"
type_spec /= ["unsigned"] ("short" / "long" / "long" "long")
type_spec /= "char" / "wchar" / "boolean" / "octet"
type_spec /= scoped_name
prop_list = "[" *(property ",") property "]"
property = ident [parens]
raise_list = "raises" "(" *(scoped_name) ",") scoped_name ")"
scoped_name = *(ident "::") ident / "::" ident
; In regex: [A-Za-z_][A-Za-z0-9_]*; identifiers beginning with _ cause warnings
ident = (%x41-5a / %x61-7a / "_") *(%x41-5a / %x61-7a / %x30-39 / "_")
parens = "(" 1*(any char except ")") ")"
Functionality not used in xpidl
The libIDL parser we use is more powerful than xpidl itself can understand. The following is a list of potential features which are parseable but may not result in expected code:
- Struct, union, and enumerated types
- Array declarators (appears to be supported in xpidl_header.c but not xpidl_typelib.c)
- Exception declarations
- Module declarations
- Variable arguments (that makes the ABNF get more wonky)
- Sequence types
- Max-length strings
- Fixed-point numbers
- "any" and "long double" types.
Pyxpidl syntax
idlfile = *(CDATA / INCLUDE / interface / typedef / native)
typedef = "typedef" IDENTIFER IDENTIFIER ";"
native = [attributes] "native" IDENTIFIER "(" NATIVEID ")"
interface = [attributes] "interface" IDENTIFIER" [ifacebase] [ifacebody] ";"
ifacebase = ":" IDENTIFIER
ifacebody = "{" *(member) "}"
member = CDATA / "const" IDENTIFIER IDENTIFIER "=" number ";"
member /= [attributes] ["readonly"] "attribute" IDENTIFIER IDENTIFER ";"
member /= [attributes] IDENTIFIER IDENTIFIER "(" paramlist ")" raises ";"
paramlist = [param *("," param)]
raises = ["raises" "(" IDENTIFIER *("," identifier) ")"]
attributes = "[" attribute *("," attribute) "]"
attribute = (IDENTIFIER / CONST) ["(" (IDENTIFIER / IID) ")"]
param = [attributes] ("in" / "out" / "inout") IDENTIFIER IDENTIFIER
number = NUMBER / IDENTIFIER
number /= "(" number ")"
number /= "-" number
number /= number ("+" / "-" / "*") number
number /= number ("<<" / >>") number
number /= number "|" number
; Lexical tokens, I'm going to specify these in regex form
NUMBER = /-?\d+|0x[0-9A-Fa-f]+/
CDATA = /%{[ ]*C\+\+[ ]*\n(.*?\n?)%}[ ]*(C\+\+)?/s
INCLUDE = /\#include[ \t]+"[^"\n]+"/
NATIVEID = /[^()\n]+(?=\))/
IID = /[0-9A-Fa-f]{8}-[0-9A-Fa-f]{4}-[0-9A-Fa-f]{4}-[0-9A-Fa-f]{12}/
IDENTIFIER = /unsigned long long|unsigned short|unsigned long|long long|[A-Za-z][A-Za-z_0-9]*/