Our volunteers haven't translated this article into עברית yet. Join us and help get the job done!
Introduction
Low-level languages, like C, have low-level memory management primitives like malloc()
and free()
. On the other hand, JavaScript values are allocated when things (objects, strings, etc.) are created and "automatically" freed when they are not used anymore. The latter process is called garbage collection. This "automatically" is a source of confusion and gives JavaScript (and high-level languages) developers the impression they can decide not to care about memory management. This is a mistake.
Memory life cycle
Regardless of the programming language, memory life cycle is pretty much always the same:
- Allocate the memory you need
- Use the allocated memory (read, write)
- Release the allocated memory when it is not needed anymore
The first and second parts are explicit in all languages. The last part is explicit in low-level languages, but is mostly implicit in high-level languages like JavaScript.
Allocation in JavaScript
Value initialization
In order not to bother the programmer with allocations, JavaScript does it alongside with declaring values.
var n = 123; // allocates memory for a number var s = "azerty"; // allocates memory for a string var o = { a: 1, b: null }; // allocates memory for an object and contained values // (like object) allocates memory for the array and // contained values var a = [1, null, "abra"]; function f(a){ return a + 2; } // allocates a function (which is a callable object) // function expressions also allocate an object someElement.addEventListener('click', function(){ someElement.style.backgroundColor = 'blue'; }, false);
Allocation via function calls
Some function calls result in object allocation.
var d = new Date(); // allocates a Date object var e = document.createElement('div'); // allocates a DOM element
Some methods allocate new values or objects:
var s = "azerty"; var s2 = s.substr(0, 3); // s2 is a new string // Since strings are immutable value, // JavaScript may decide to not allocate memory, // but just store the [0, 3] range. var a = ["ouais ouais", "nan nan"]; var a2 = ["generation", "nan nan"]; var a3 = a.concat(a2); // new array with 4 elements being // the concatenation of a and a2 elements
Using values
Using value basically means reading and writing in allocated memory. This can be done by reading or writing the value of a variable or an object property or even passing an argument to a function.
Release when the memory is not needed anymore
Most of memory management issues come at this phase. The hardest task here is to find when "the allocated memory is not needed any longer". It often requires for the developer to determine where in the program such piece of memory is not needed anymore and free it.
High-level languages embed a piece of software called "garbage collector" whose job is to track memory allocation and use in order to find when a piece of allocated memory is not needed any longer in which case, it will automatically free it. This process is an approximation since the general problem of knowing whether some piece of memory is needed is undecidable (can't be solved by an algorithm).
Garbage collection
As stated above the general problem of automatically finding whether some memory "is not needed anymore" is undecidable. As a consequence, garbage collections implement a restriction of a solution to the general problem. This section will explain the necessary notions to understand the main garbage collection algorithms and their limitations.
References
The main notion garbage collection algorithms rely on is the notion of reference. Within the context of memory management, an object is said to reference another object if the former has an access to the latter (either implicitly or explicitly). For instance, a JavaScript object has a reference to its prototype (implicit reference) and to its properties values (explicit reference).
In this context, the notion of "object" is extended to something broader than regular JavaScript objects and also contains function scopes (or the global lexical scope).
Reference-counting garbage collection
This is the most naive garbage collection algorithm. This algorithm reduces the definition of "an object is not needed anymore" to "an object has no other object referencing to it". An object is considered garbage collectable if there is zero reference pointing at this object.
Example
var o = { a: { b:2 } }; // 2 objects are created. One is referenced by the other as one of its properties. // The other is referenced by virtue of being assigned to the 'o' variable. // Obviously, none can be garbage-collected var o2 = o; // the 'o2' variable is the second thing that // has a reference to the object o = 1; // now, the object that was originally in 'o' has a unique reference // embodied by the 'o2' variable var oa = o2.a; // reference to 'a' property of the object. // This object has now 2 references: one as a property, // the other as the 'oa' variable o2 = "yo"; // The object that was originally in 'o' has now zero // references to it. It can be garbage-collected. // However what was its 'a' property is still referenced by // the 'oa' variable, so it cannot be freed oa = null; // what was the 'a' property of the object originally in o // has zero references to it. It can be garbage collected.
Limitation: cycles
There is a limitation when it comes to cycles. In the following example two objects are created and reference one another – thus creating a cycle. They will not get out of the function scope after the function call, so they are effectively useless and could be freed. However, the reference-counting algorithm considers that since each of both object is referenced at least once, none can be garbage-collected.
function f(){ var o = {}; var o2 = {}; o.a = o2; // o references o2 o2.a = o; // o2 references o return "azerty"; } f();
Real-life example
Internet Explorer 6 and 7 are known to have reference-counting garbage collectors for DOM objects. Cycles are a common mistake that can generate memory leaks:
var div; window.onload = function(){ div = document.getElementById("myDivElement"); div.circularReference = div; div.lotsOfData = new Array(10000).join("*"); };
In the above example, the DOM element "myDivElement" has a circular reference to itself in the "circularReference" property. If the property is not explicitly removed or nulled, a reference-counting garbage collector will always have at least one reference intact and will keep the DOM element in memory even if it was removed from the DOM tree. If the DOM element holds lots of data (illustrated in the above example with the "lotsOfData" property), the memory consumed by this data will never be released.
Mark-and-sweep algorithm
This algorithm reduces the definition of "an object is not needed anymore" to "an object is unreachable".
This algorithm assumes the knowledge of a set of objects called roots (In JavaScript, the root is the global object). Periodically, the garbage-collector will start from these roots, find all objects that are referenced from these roots, then all objects referenced from these, etc. Starting from the roots, the garbage collector will thus find all reachable objects and collect all non-reachable objects.
This algorithm is better than the previous one since "an object has zero reference" leads to this object being unreachable. The opposite is not true as we have seen with cycles.
As of 2012, all modern browsers ship a mark-and-sweep garbage-collector. All improvements made in the field of JavaScript garbage collection (generational/incremental/concurrent/parallel garbage collection) over the last few years are implementation improvements of this algorithm, but not improvements over the garbage collection algorithm itself nor its reduction of the definition of when "an object is not needed anymore".
Cycles are not a problem anymore
In the first above example, after the function call returns, the 2 objects are not referenced anymore by something reachable from the global object. Consequently, they will be found unreachable by the garbage collector.
The same thing goes with the second example. Once the div and its handler are made unreachable from the roots, they can both be garbage-collected despite referencing each other.
Limitation: objects need to be made explicitly unreachable
Although this is marked as a limitation, it is one that is rarely reached in practice which is why no one usually cares that much about garbage collection.