Layout System Overview

Layout's Job: Provide the Presentation

Layout is primarily concerned with providing a presentation to an HTML or XML document. This presentation is typically formatted in accordance with the requirements of the CSS1 and CSS2 specifications from the W3C. Presentation formatting is also required to provide compatibility with legacy browsers (Microsoft Internet Explorer and Netscape Navigator 4.x). The decision about when to apply CSS-specified formatting and when to apply legacy formatting is controlled by the document's DOCTYPE specification. These layout modes are referred to as 'Standards' and 'NavQuirks' modes. (DOCTYPE and modes are explained in more detail in article Mozilla's DOCTYPE sniffing.

Galley and Page Presentations: The presentation generally is constrained by the width of the window in which the presentation is to be displayed, and a height that extends as far as necessary. This is referred to as the Galley Mode presentation, and is what one expects from a typical browser. Additionally, layout must support a paginated presentation, where the width of the presentation is constrained by the dimensions of the printed output (paper) and the height of each page is fixed. This paged presentation presents several challenges not present in the galley presentation, namely how to break up elements that are larger than a single page, and how to handle changes to page dimensions.

The original design of the Layout system allowed for multiple, possibly different, presentations to be supported simultaneously from the same content model. In other words, the same HTML or XML document could be viewed as a normal galley presentation in a browser window, while simultaneously being presented in a paged presentation to a printer, or even an aural presentation to a speech-synthesizer. To date the only real use of this multiple presentation ability is seen in printing, where multiple presentations are managed, all connected to the same content model. (note: it is unclear if this is really a benefit - it may have been better to use a copy of the content model for each presentation, and to remove the complexity of managing separate presentations - analysis is needed here). The idea of supporting a non-visual presentation is interesting. Layout's support for aural presentations is undeveloped, though conceptually, it is possible and supported by the architecture.

How Layout Does its Job: Frames and Reflow

So, layout is concerned with providing a presentation, in galley or paged mode. Given a content model, how does the layout system actually create the presentation? Through the creation and management of frames. Frames are an encapsulation of a region on the screen, a region that contains geometry (size and location, stacking order). Generally frames correspond to the content elements, though there is often a one-to-many correspondence between content elements and frames. Layout creates frames for content based on either the specific HTML rules for an element or based on the CSS display type of the element. In the case of the HTML-specific elements, the frame types that correspond to the element are hard-coded, but in the more general case where the display type is needed, the layout system must determine that display type by using the Style System. A content element is passed to the Style System and a request is made to resolve the style for that element. This causes the Style System to apply all style rules that correspond to the element and results in a resolved Style Context - the style data specific to that element. The Layout module looks at the 'display' field of the style context to determine what kind of frame to create (block, inline, table, etc.). The style context is associated with the frame via a reference because it is needed for many other computations during the formatting of the frames.

Once a frame is created for a content element, it must be formatted. We refer to this as 'laying out' the frame, or as 'reflowing' the frame. Each frame implements a Reflow method to compute its position and size, among other things. For more details on the Reflow mechanism, see the Reflow Overview document... The CSS formatting requirements present two distinct layout models: 1) the in-flow model, where the geometry of an element is influenced by the geometry of the elements that precede it, and 2) the positioned model, where the geometry of an element is not influenced by the geometry of the elements that precede it, or in any case, is influenced more locally. The in-flow case is considered the 'normal' case, and corresponds to normal HTML formatting. The later case, called 'out of flow' puts the document author in control of the layout, and the author must specify the locations and sizes of all of the elements that are positioned. There is, of course, some complexity involved with managing these two models simultaneously...

So far the general flow of layout looks like this:

Obtain a document's content model
Utilize the Style System to resolve the style of each element in the content model
Construct the frames that correspond to the content model, according to the resolved style data.
Perform the initial layout, or initial reflow, on the newly constructed frame.

This is pretty straight-forward, but is complicated somewhat by the notion of incrementalism. One of the goals of the Layout system's design is to create parts of the presentation as they become available, rather than waiting for the entire document to be read, parsed, and then presented. This is a major benefit for large documents because the user does not have to wait for the 200th page of text to be read in before the first page can be displayed - they can start reading something right away. So really, this sequence of operations Resolve Style, Create Frame, Layout Frame, gets repeated many times as the content becomes available. In the normal in-flow case this is quite natural because the sequential addition of new content results in sequential addition of new frames, and because everything is in-flow, the new frames do not influence the geometry of the frames that have already been formatted. When out-of-flow frames are present this is a more difficult problem, however. Sometimes a content element comes in incrementally, and invalidates the formatting of some of the frames that precede it, frame that have already been formatted. In this case the Layout System has to detect that impact and reflow again the impacted frames. This is referred to as an incremental reflow.

Another responsibility of layout is to manage dynamic changes to the content model, changes that occur after the document has been loaded and (possibly) presented. These dynamic changes are caused by manipulations of the content model via the DOM (generally through JavaScript). When a content element is modified, added or removed, Layout is notified. If content elements have been inserted, new frames are created and formatted (and impacts to other frames are managed by performing further incremental reflows). If content is removed, the corresponding frames are destroyed (again, impacts to other elements are managed). If a content element is modified, layout must determine if the change influences the formatting of that or other elements' presentations, and must then reformat, or re-reflow, the impacted elements. In all cases, the determination of the impact is critical to avoid either the problem of not updating impacted elements, thus presenting an invalid presentation, or updating too much of the presentation and thus doing too much work, potentially causing performance problems.

One very special case of dynamic content manipulation is the HTML Editor. Layout is used to implement both a full-blown WYSIWYG HTML editor, and a single and multi-line text entry control. In both cases, the content is manipulated by the user (via the DOM) and the resulting visual impacts must be shown as quickly as possible, without disconcerting flicker or other artifacts that might bother the user. Consider a text entry field: the user types text into a form on the web. As the user types a new character it is inserted into the content model. This causes layout to be notified that a new piece of content has been entered, which causes Layout to create a new frame and format it. This must happen very fast, so the user's typing is not delayed. In the case of the WYSIWYG HTML editor, the user expects that the modifications they make to the document will appear immediately, not seconds later. This is especially critical when the user is typing into the document: it would be quite unusable if typing a character at the end of a document in the HTML editor caused the entire document to be reformatted - it would be too slow, at least on low-end machines. Thus the HTML editor and text controls put considerable performance requirements on the layout system's handling of dynamic content manipulation.

The Fundamentals of Frames: Block and Line

There are many types of frames that are designed to model various formatting requirements of different HTML and XML elements. CSS2 defines several (block, inline, list-item, marker, run-in, compact, and various table types) and the standard HTML form controls require their own special frame types to be formatted as expected. The most essential frame types are Block and Inline, and these correspond to the most important Layout concepts, the Block and Line.

A block is a rectangular region that is composed of one or more lines. A line is a single row of text or other presentational elements. All layout happens in the context of a block, and all of the contents of a block are formatted into lines within that block. As the width of a block is changed, the contents of the lines must be reformatted. Consider for example a large paragraph of text sitting in paragraph:

<p>
 We need documentation for users, web developers, and developers working
 on Mozilla. If you write your own code, document it. Much of the
 existing code <b>isn’t very well documented</b>. In the process of figuring 
 things out, try and document your discoveries. 
 If you’d like to contribute, let us know.
</p>

There is one block that corresponds to the <p> element, and then a number of lines of text that correspond to the text. As the width of the block changes (due to the window being resized, for example) the length of the lines within it changes, and thus more or less text appears on each line. The block is responsible for managing the lines. Note that lines may contain only inline elements, whereas block may contain both inline elements and other blocks.

Other Layout Models: XUL

In addition to managing CSS-defined formatting, the layout system provides a way to integrate other layout schemes into the presentation. Currently layout supports the formatting of XUL elements, which utilize a constraint-based layout language. The Box is introduced into the layout system's frame model via an adapter (BoxToBlockAdapter) that translates the normal layout model into the box formatting model. Conceptually, this could be used to introduce other layout systems, but it might be worth noting that there was no specific facility designed into the layout system to accommodate this. Layout deals with frames, but as long as the layout system being considered has no need to influence presentational elements from other layout systems, it can be adapted using a frame-based adapter, ala XUL.

Core Classes

At the highest level, the layout system is a group of classes that manages the presentation within a fixed width and either unlimited height (galley presentation) or discrete page heights (paged presentation). Digging just a tiny bit deeper into the system we find that the complexity (and interest) mushrooms very rapidly. The idea of formatting text and graphics to fit within a given screen area sounds simple, but the interaction of in-flow and out-of-flow elements, the considerations of incremental page rendering, and the performance concerns of dynamic content changes makes for a system that has a lot of work to do, and a lot of data to manage. Here are the high-level classes that make up the layout system. Of course this is a small percentage of the total classes in layout (see the detailed design documents for the details on all of the classes, in the context of their actual role).

Presentation Shell / Presentation Context

Together the presentation shell and the presentation context provide the root of the current presentation. The original design provided for a single Presentation Shell to manage multiple Presentation Contexts, to allow a single shell to handle multiple presentations. It is unclear if this is really possible, however, and in general it is assumed that there is a one-to-one correspondence between a presentation shell and a presentation context. The two classes should probably be folded into one, or the distinction between them formalized and utilized in the code. The Presentation Shell currently owns a controlling reference to the Presentation Context. Further references to the Presentation Shell and Presentation Context will be made by using the term Presentation Shell.

The Presentation Shell is the root of the presentation, and as such it owns and manages a lot of layout objects that are used to create and maintain a presentation (note that the DocumentViewer is the owner of the Presentation Shell, and in some cases the creator of the objects used by the Presentation Shell to manage the presentation. More details of the Document Viewer are needed here...). The Presentation Shell, or PresShell, is first and foremost the owner of the formatting objects, the frames. Management of the frames is facilitated by the Frame Manager, an instance of which the PresShell owns. Additionally, the PresShell provides a specialized storage heap for frames, called an Arena, that is used to make allocation and deallocation of frames faster and less likely to fragment the global heap.

The Presentation Shell also owns the root of the Style System, the Style Set. In many cases the Presentation Shell provides pass-through methods to the Style Set, and generally uses the Style Set to do style resolution and Style Sheet management.

One of the critical aspects of the Presentation Shell is the handling of frame formatting, or reflow. The Presentation Shell owns and maintains a Reflow Queue where requests for reflow are held until it is time to perform a reflow, and then pulled out and executed.

It is also important to see the Presentation Shell as an observer of many kinds of events in the system. For example, the Presentation Shell receives notifications of document load events, which are used to trigger updates to the formatting of the frames in some cases. The Presentation Shell also receives notifications about changes in cursor and focus states, whereby the selection and caret updates can be made visible.

There are dozens of other data objects managed by the Presentation Shell and Presentation Context, all necessary for the internal implementation. These data members and their roles will be discussed in the Presentation Shell design documents. For this overview, the Frames, Style Set, and Reflow Queue are the most important high-level parts of the Presentation Shell.

Frame Manager

The Frame Manager is used to, well, manage frames. Frames are basic formatting objects used in layout, and the Frame Manager is responsible for making frames available to clients. There are several collections of frames maintained by the Frame Manager. The most basic is a list of all of the frames starting at the root frame. Clients generally do not want to incur the expense of traversing all of the frames from the root to find the frame they are interested in, so the Frame Manager provides some other mappings based on the needs of the clients.

The most crucial mapping is the Primary Frame Map. This collection provides access to a frame that is designated as the primary frame for a piece of content. When a frame is created for a piece of content, it may be the 'primary frame' for that content element (content elements that require multiple frames have primary and secondary frames; only the primary frame is mapped). The Frame Manager is then instructed to store the mapping from a content element to the primary frame. This mapping facilitates updates to frames that result in changes to content (see discussion above).

Another important mapping maintained by the Frame Manager is that of the undisplayed content. When a content element is defined as having no display (via the CSS property 'display:none') it is noted by a special entry in the undisplayed map. This is important because no frame is generated for these elements yet changes to their style values and to the content elements still need to be handled by layout, in case their display state changes from 'none' to something else. The Undisplayed Map keeps track of all content and style data for elements that currently have no frames. (note: the original architecture of the layout system included the creation of frames for elements with no display. This changed somewhere along the line, but there is no indication of why the change was made. Presumably it is more time and space-efficient to prevent frame creation for elements with no display.)

The Frame Manager also maintains a list of Forms and Form Controls, as content nodes. This is presumably related to the fact that layout is responsible for form submission, but this is a design flaw that is being corrected by moving form submission into the content world. These collections of Forms and Form Controls should be removed eventually.

CSS Frame Constructor

The Frame Constructor is responsible for resolving style values for content elements and creating the appropriate frames corresponding to that element. In addition to managing the creation of frames, the Frame Constructor is responsible for managing changes to the frames. Frame Construction is generally achieved by the use of stateless methods, but in some cases there is the need to provide context to frames created as children of a container. The Frame Manager uses the Frame Constructor State class to manage the important information about the container of a frame being created (and lots of other state-stuff too - need to describe more fully).

Frame

The Frame is the most fundamental layout object. The class nsFrame is the base class for all frames, and it inherits from the base class nsIFrame (note: nsIFrame is NOT an interface, it is an abstract base class. It was once an interface but was changed to be a base class when the Style System was modified - the name was not changed to reflect that it is not an interface). The Frame provides generic functionality that can be used by subclasses but cannot itself be instantiated.

nsFrame:
The Frame provides a mechanism to navigate to a parent frame as well as child frames. All frames have a parent except for the root frame. The Frame is able to provide a reference to its parent and to its children upon request. The basic data that all frames maintain include: a rectangle describing the dimensions of the frame, a pointer to the content that the frame is representing, the style context representing all of the stylistic data corresponding to the frame, a parent frame pointer, a sibling frame pointer, and a series of state bits.

Frames are chained primarily by the sibling links. Given a frame, one can walk the sibling of that frame, and can also navigate back up to the parent frame. Specializations of the frame also allow for the management of child frames; this functionality is provided by the Container Frame.

Container Frames:
The Container Frame is a specialization of the base frame class that introduces the ability to manage a list of child frames. All frames that need to manage child frames (e.g. frames that are not themselves leaf frames) derive from Container Frame.

Block Frame

Reflow State

Reflow Metrics

Space Manager

StyleSet

StyleContext

Document History

05/20/2002 - Marc Attinasi: created, wrote highest level introduction to general layout concepts, links to relevant specs and existing documents.

Original Document Information

Author(s): Marc Attinasi
Last Updated Date: November 20, 2005