In practise, very few XHTML documents are served over the Web with the correct MIME media type, application/xhtml+xml
. Whilst authored to the stricter rules of XML, they are sent with the media type for HTML (text/html
). The receiving browser considers the content to be HTML, and does not utilise its XML parser.
There are a number of reasons for this. Partially it is because, prior to version 9, Internet Explorer was incapable of handling XHTML sent with the official XHTML media type at all. (Rather than displaying content, it would present the user with a file download dialog.) But it is also founded in the experience that JavaScript, authored carefully for HTML, can break when placed with an XML environment.
This article shows some of the reasons alongside with strategies to remedy the problems. It will encourage web authors to use more XML features and make their JavaScript interoperable with real XHTML applications.
(Note that XHTML documents which behave correctly in both application/xhtml+xml
and text/html
environments are sometimes known as 'polyglot' documents.)
To test the following examples locally, use Firefox's extension switch. Just write an ordinary (X)HTML file and save it once as test.html and once as test.xhtml.
Problem: Nothing Works
After switching the MIME type suddenly no inline script works anymore. Even the plain old alert() method is gone. The code looks something like this:
<script type="text/javascript">
//<!--
window.alert("Hello World!");
//-->
</script>
Solution: The CDATA Trick
This problem usually arises, when inline scripts are included in comments. This was common practice in HTML, to hide the scripts from browsers not capable of JS. In the age of XML comments are what they were intended: comments. Before processing the file, all comments will be stripped from the document, so enclosing your script in them is like throwing your lunch in a Piranha pool. Moreover, there's really no point to commenting out your scripts -- no browser written in the last ten years will display your code on the page.
The easy solution is to do away with the commenting entirely:
<script type="text/javascript">
window.alert("Hello World!");
</script>
This will work so long as your code doesn't contain characters which are "special" in XML, which usually means <
and &
. If your code contains either of these, you can work around this with CDATA sections:
<script type="text/javascript">
<![CDATA[
// is the variable a non-negative integer less than 10?
if (variable < 10 && variable >= 0)
action();
]]>
</script>
Note that the CDATA section is only necessary because of the <
in the code; otherwise you could have ignored it.
A third solution is to use only external scripts, neatly sidestepping the special-character problem.
Alternatively, the CDATA section can be couched within comments so as to be able to work in either application/xhtml+xml or text/html:
<script type="text/javascript"> //<![CDATA[ ... //]]> </script> <!-- (For styles, it is different) --> <style type="text/css"> /*<![CDATA[*/ ... /*]]>*/ </style>
And if you really need compatibility with very old browsers that do not recognize the script or style tags resulting in their contents displayed on the page, you can use this:
<script type="text/javascript"><!--//--><![CDATA[//><!-- ... //--><!]]></script> <!-- (For styles, it is different) --> <style type="text/css"><!--/*--><![CDATA[/*><!--*/ ... /*]]>*/--></style>
See this document for more on the issues related to application/xhtml+xml and text/html (at least as far as XHTML 1.* and HTML 4; HTML5 addresses many of these problems).
Problem: Names in XHTML and HTML are represented in different cases
Scripts that used getElementsByTagName() with an upper case HTML name no longer work, and attributes like nodeName or tagName return upper case in HTML and lower case in XHTML.
Solution: Use or convert to lower case
For methods like getElementsByTagName(), passing the name in lower case will work in both HTML and XHTML. For name comparisons, first convert to lower case before doing the comparison (e.g., "el.nodeName.toLowerCase() === 'html'"). This will ensure that documents in HTML will compare correctly and will do no harm in XHTML where the names are already lower case.
Problem: My Cookie Isn't Saved!
We found out already, that the document object in XML files is different from the ones in HTML files. Now we take a look at one widly used property that is missing in XML files. In XML documents there is no document.cookie. That is, you can write something like
document.cookie = "key=value";
in XML as well, but nothing is saved in cookie storage.
Solution: Use the Storage Object
With Firefox 2 there was a new feature enabled, the HTML 5 Storage object. Although this feature is not free of critics, you can use it to bypass the non-existing cookie, if your document is of type XML. Again, you will have to write your own wrapper to respect any given combination of MIME type and browser.
Problem: I Can't Use document.write()
This problem has the same cause as the one above. This method does not exist in XMLDocuments anymore. There are reasons why this decision was made, one being that a string of invalid markup will instantly break the whole document.
Solution: Use DOM Methods
Many people avoided DOM methods because of the typing to create one simple element, when document.write() worked. Now you can't do this as easily as before. Use DOM methods to create all of your elements, attributes and other nodes. This is XML proof, as long as you keep the namespace problem in focus (e.g., there is a document.createElementNS method).
Of course, you can still use strings like in document.write(), but it takes a little more effort. For example:
var string = '<div xmlns="https://www.w3.org/999/xhtml"><h1>Hello World!</h1></div>';
var parser = new DOMParser();
var documentFragment = parser.parseFromString(string, "text/xml");
body.appendChild(documentFragment); // assuming 'body' is the body element
But be aware that if your string is not well-formed XML (e.g., you have an & where it should not be), then this method will crash, leaving you with a parser error.
Problem: I want to remain forward compatible!
Given the direction away from formatting attributes and the possibility of XHTML becoming eventually more prominent (or at least the document author having the possibility of later wanting to make documents available in XHTML for browsers that support it), one may wish to avoid features which are not likely to stay compatible into the future.
Solution: Avoid HTML-specific DOM
The HTML DOM , even though it is compatible with XHTML 1.0, is not guaranteed to work with future versions of XHTML (perhaps especially the formatting properties which have been deprecated as element attributes). The regular XML DOM provides sufficient methods via the Element interface for getting/setting/removing attributes.
Problem: My Favourite JS Library still Breaks
If you use JavaScript libraries like the famous prototype.js or Yahoo's one, there is bad news for you: As long as the developers don't apply the fixes mentioned above, you won't be able to use them in your XML-XHTML applications.
Two possible ways still are there, but neither is very promissing: Take the library, recode it and publish it or e-mail the developers, e-mail your friends to e-mail the developers and e-mail your customers to e-mail the developers. If they get the hint and are not too annoyed, perhaps they start to implement XML features in their libraries.
I Read about E4X. Now, This Is Perfect, Isn't It?
As a matter of fact, it isn't. E4X is a new method of using and manipulating XML in JavaScript. But, standardized by ECMA, they neglected to implement an interface to let E4X objects interact with DOM objects our document consists of. So, with every advantage E4X has, without a DOM interface you can't use it productively to manipulate your document. However, it can be used for data, and be converted into a string which can then be converted into a DOM object. DOM objects can similarly be converted into strings which can then be converted into E4X.
Finally: Content Negotiation
Now, how do we decide, when to serve XHTML as XML? We can do this on server side by evaluating the HTTP request header. Every browser sends with its request a list of MIME types it understands, as part of the HTTP content negotiation mechanism. So if the browser tells our server, that it can handle XHTML as XML, that is, the Accept:
field in the HTTP head contains application/xhtml+xml
somewhere, we are safe to send the content as XML.
In PHP, for example, you would write something like this:
if( strpos( $_SERVER['HTTP_ACCEPT'], "application/xhtml+xml" ) ) { header( "Content-type: application/xhtml+xml" ); echo '<?xml version="1.0" ?>'."\n"; } else { header( "Content-type: text/html" ); }
This distinction also sends the XML declaration, which is strongly recommended, when the document is an XML file. If the content is sent as HTML, an XML declaration would break IE's Doctype switch, so we don't want it there.
For completeness here is the Accept field, that Firefox 2.0.0.9 sends with its requests:
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Further Reading
You will find several useful articles in the developer wiki:
DOM 2 methods you will need are: