Costruire e decostruire un documento XML

Questa pagina è in fase di traduzione: contribuisci anche tu completando le parti mancanti. Il testo da tradurre potrebbe essere nascosto nella pagina: vai in modifica per visualizzarlo

Quest'articolo si propone di fornire una guida esaustiva per l'uso di XML per mezzo Javascript. Esso si divide in due sezioni. Nella prima sezione verranno illustrati tutti i possibili metodi per costruire un albero DOM, nella seconda invece si darà per scontato che saremo già in possesso di un albero DOM e il nostro scopo sarà quello di trattarne il contenuto.

Che cos'è un albero DOM?

Per albero DOM s'intende un'istanza di Document. Si tratta quindi di un oggetto Javascript e non è da confondere con una stringa di testo contenente il codice sorgente di un documento XML ancora da parsare.

DOM trees can be queried using XPath expressions, converted to strings or written to a local or remote files using XMLSerializer (without having to first convert to a string), POSTed to a web server (via XMLHttpRequest),

You can use DOM trees to model data which isn't well-suited for RDF (or perhaps you just don't like RDF). Another application is that, since XUL is XML, the UI of your application can be dynamically manipulated, downloaded, uploaded, saved, loaded, converted, or transformed quite easily.

Mozilla gestisce ampiamente XML. Sono gestite diverse Raccomandazioni e bozze del World Wide Web Consortium (W3C) per la famiglia XML, così come altre tecnologie relative. Tra le più importanti tecnologie native che Mozilla offre per lavorare con documenti XML sono da citare:

XPath per indirizzare parti diverse di un documento XML,
XMLSerializer per convertire alberi DOM in stringhe o files,
DOMParser costruire un documento XML convertendo delle stringhe in alberi DOM,
XMLHttpRequest per parsare a partire da file documenti XML in albero DOM. Sebbene anche le istanze di DOMParser abbiano un metodo chiamato parseFromStream(), è più facile utilizzare XMLHttpRequest che lavore sia con file remoti (non confinati al solo protocollo HTTP) che con file locali,
XSLT e XLink per manipolare il contenuto di un documento XML.

È possibile comunque creare manualmente propri algoritmi per la serializzazione o la conversione di un documento XML, come si vedrà in seguito.

Prima parte: costruire un albero DOM

Come precedentemente accennato, in questa prima sezione il nostro scopo sarà quello di ottenere un albero DOM.

Un albero DOM è un oggetto (e precisamente un'istanza di Document). Ci sono molti modi per costruirlo o ottenerlo, a seconda delle proprie esigenze. Di seguito verranno elencate varie strade: a partire da una stringa di codice sorgente, a partire da file o a partire da strutture di differente natura.

Creare dinamicamente un albero DOM

Questo paragrafo illustra come utilizzare l'API JavaScript W3C DOM per creare e modificare oggetti DOM. Essa è attiva in tutte le applicazioni Gecko-based (come Firefox, per esempio) sia in privileged code (estensioni) che in unprivileged code (pagine internet).

Scrivendolo a mano

L'API JavaScript W3C DOM, supportata da Mozilla, può essere invocata manualmente.

Si consideri il seguente documento XML:

<?xml version="1.0"?>
<people>
  <person first-name="eric" middle-initial="H" last-name="jung">
    <address street="321 south st" city="denver" state="co" country="usa" />
    <address street="123 main st" city="arlington" state="ma" country="usa" />
  </person>
  <person first-name="jed" last-name="brown">
    <address street="321 north st" city="atlanta" state="ga" country="usa" />
    <address street="123 west st" city="seattle" state="wa" country="usa" />
    <address street="321 south avenue" city="denver" state="co" country="usa" />
  </person>
</people>

Grazie all'API W3C DOM è possibile creare una rappresentazione di esso come la seguente, presente unicamente nella memoria dell'interprete:

var doc = document.implementation.createDocument("", "", null);
var peopleElem = doc.createElement("people");

var personElem1 = doc.createElement("person");
personElem1.setAttribute("first-name", "eric");
personElem1.setAttribute("middle-initial", "h");
personElem1.setAttribute("last-name", "jung");

var addressElem1 = doc.createElement("address");
addressElem1.setAttribute("street", "321 south st");
addressElem1.setAttribute("city", "denver");
addressElem1.setAttribute("state", "co");
addressElem1.setAttribute("country", "usa");
personElem1.appendChild(addressElem1);

var addressElem2 = doc.createElement("address");
addressElem2.setAttribute("street", "123 main st");
addressElem2.setAttribute("city", "arlington");
addressElem2.setAttribute("state", "ma");
addressElem2.setAttribute("country", "usa");
personElem1.appendChild(addressElem2);

var personElem2 = doc.createElement("person");
personElem2.setAttribute("first-name", "jed");
personElem2.setAttribute("last-name", "brown");

var addressElem3 = doc.createElement("address");
addressElem3.setAttribute("street", "321 north st");
addressElem3.setAttribute("city", "atlanta");
addressElem3.setAttribute("state", "ga");
addressElem3.setAttribute("country", "usa");
personElem2.appendChild(addressElem3);

var addressElem4 = doc.createElement("address");
addressElem4.setAttribute("street", "123 west st");
addressElem4.setAttribute("city", "seattle");
addressElem4.setAttribute("state", "wa");
addressElem4.setAttribute("country", "usa");
personElem2.appendChild(addressElem4);

var addressElem5 = doc.createElement("address");
addressElem5.setAttribute("street", "321 south avenue");
addressElem5.setAttribute("city", "denver");
addressElem5.setAttribute("state", "co");
addressElem5.setAttribute("country", "usa");
personElem2.appendChild(addressElem5);

peopleElem.appendChild(personElem1);
peopleElem.appendChild(personElem2);
doc.appendChild(peopleElem);

Si veda anche Il capitolo sul DOM del Tutorial XUL (in inglese).

Automatizzando la creazione dinamica dell'albero DOM

L'invocazione dell'API Javascript W3C DOM, può essere anche automatizzata.

Non esiste un metodo unico per automatizzare la creazione di un documento XML. Esso dipende molto dal tipo di dati che andremo a scrivere. In ogni caso, per vederne un possibile esempio, si vada all'ultimo paragrafo del capitolo su JXON.

Costruire un albero DOM XML a partire da stringhe di codice sorgente

Il seguente esempio mostra la costruzione di un albero DOM tramite parsing di un codice sorgente.

var sSource = "<a id=\"a\"><b id=\"b\">hey!<\/b><\/a>";
var oParser = new DOMParser();
var oDOM = oParser.parseFromString(sSource, "text\/xml");
// print the name of the root element or error message
dump(oDOM.documentElement.nodeName == "parsererror" ? "error while parsing" : oDOM.documentElement.nodeName);

Tutorial su come rendere questo codice cross browser (in inglese)

Costruire un albero DOM a partire da un file

Preambolo da stendere.

Usando `DOMParser`

Ciascuna istanza di DOMParser possiede diversi metodi per parsare un documento XML a partire da un file. È possibile fare ricorso a parseFromStream():

function loadXMLFile (sFile) {
  var oIOServ = Components.classes["@mozilla.org/network/io-service;1"].getService(Components.interfaces.nsIIOService);
  var oChannel = oIOServ.newChannel(sFile,null,null);
  var oStream = oChannel.open();
  // alert("oStream.available() = " + oStream.available()); // debug
  var parser = new DOMParser();

  doc = parser.parseFromStream(oStream, "", oStream.available(),"text/xml");

  // alert("doc=" + doc); // debug
  oStream.close();
  
  return doc;
}

// alert(loadXMLFile("file:///home/john/hello.xml"));

oppure utilizzare parseFromBuffer():

// Esempio mancante

In ogni caso il metodo più pratico per accedere al contenuto di un file XML resta ajax, per l'uso del quale si rimanda al prossimo paragrafo.

Usando `XMLHttpRequest`

Come già precedentemente accennato, sebbene ciascuna istanza di DOMParser possegga un metodo chiamato parseFromStream(), è più facile utilizzare XMLHttpRequest per parsare documenti XML in alberi DOM (XMLHttpRequest funziona bene sia in locale che in remoto). Di seguito c'è un codice di esempio che legge e parsa in un albero DOM un file XML locale:

var oReq = new XMLHttpRequest();
oReq.open("GET", "chrome://passwdmaker/content/people.xml", false);
oReq.send(null);
// print the name of the root element or error message
var oDOM = oReq.responseXML;
dump(oDOM.documentElement.nodeName == "parsererror" ? "error while parsing" : oDOM.documentElement.nodeName);

N.B. Il metodo responseXML è sempre un'istanza di Document – e di conseguenza un oggetto – a differenza del metodo responseText, che è sempre un valore primario (una stringa).

Usando l'elemento `<object>`.

Di seguito è presentata un'altra via possibile per parsare un file XML in un albero DOM: usando il tag <object>. Prima di lanciare il seguente esempio è necessario creare un file XML chiamato purchase_order.xml e contenente un albero simile a questo:

purchase_order.xml

<?xml version="1.0"?>
<purchaseOrder xmlns="https://example.mozilla.org/PurchaseOrderML">
  <lineItem>
    <name>Line Item 1</name>
    <price>1.25</price>
  </lineItem>
  <lineItem>
    <name>Line Item 2</name>
    <price>2.48</price>
  </lineItem>
</purchaseOrder>

Adesso proviamo a lanciare il nostro esempio:

<!doctype html>
<html>
<head>
<title>XML Data Block Demo</title>
<script>
function runDemo() {
  var doc = document.getElementById("purchase-order").contentDocument;
  var lineItems = doc.getElementsByTagNameNS("https://example.mozilla.org/PurchaseOrderML", "lineItem");
  var firstPrice = lineItems[0].getElementsByTagNameNS("https://example.mozilla.org/PurchaseOrderML", "price")[0].textContent;
  document.getElementById("output-box").textContent = "The purchase order contains " + lineItems.length + " line items. The price of the first line item is " + firstPrice + ".";
}
</script>
</head>
<body onload="runDemo()";>
<object id="purchase-order" data="purchase_order.xml" type="text/xml" style="display: none;"></object>
<div id="output-box">Demo did not run</div>
</body>
</html>

Per ulteriori approfondimenti, si rimanda all'articolo: Usare le XML Data Islands in Mozilla.

Seconda parte: decostruire un albero DOM

Da adesso in poi daremo per scontato il fatto che abbiamo già un albero DOM nella memoria dell'interprete Javascript e che il nostro scopo è quello di utilizzare tale istanza di Document nei modi più disparati.

Convertire un documento XML in stringhe di codice sorgente

L'esempio seguente mostra come ottenere dalla variabile doc — il nostro albero DOM — una stringa contenente l'intero suo codice sorgente:

var oSerializer = new XMLSerializer();
var sXML = oSerializer.serializeToString(doc);

Non è possibile creare un istanza di XMLSerializer (ovvero lanciare: new XMLSerializer()) dall'interno di un componente JS XPCOM o dall'interno di un modulo. Per farlo bisogna lanciare:

var oSerializer = Components.classes["@mozilla.org/xmlextras/xmlserializer;1"].createInstance(Components.interfaces.nsIDOMSerializer);
var sXML = oSerializer.serializeToString(doc);

Come ottenere stringhe di codice sorgente di facile lettura

You can pretty print a DOM tree using XMLSerializer and E4X. First, create a DOM tree as described in the Come creare un albero DOM article. Alternatively, use a DOM tree obtained from XMLHttpRequest. We assume it's in the doc variable.

var oSerializer = new XMLSerializer();
var sPrettyXML = XML(oSerializer.serializeToString(doc)).toXMLString();

Indents are provided with two spaces. You can, of course, use DOM:treeWalker to write your own, more performant version which also has the advantage that you can customize the indent string to be whatever you like.

Note: When using the E4X toXMLString method your CDATA elements will be lost and only the containing text remains. So using the above method might not be useful if you have CDATA elements in your XML.

<content><![CDATA[This is the content]]></content>

Will become

<content>This is the content</content>

Convertire un foglio XML in un albero di oggetti Javascript (JXON)

JXON (lossless Javascript XML Object Notation) è un nome generico col quale viene definita la rappresentazione di oggetti Javascript in linguaggio XML. Non esistono veri e propri standard per questa rappresentazione, ma da poco tempo a questa parte cominciano ad affacciarsi in rete alcune convenzioni.

JXON non è un metodo per indirizzare poche parti di un documento XML, dato che il suo punto di forza è la conversione per intero di un albero DOM. Se il nostro scopo è quello di accedere a delle informazioni limitate di un albero DOM, si raccomanda vivamente di Usare XPath.

Ci sono casi invece in cui un documento XML è costruito in maniera tale da avere come principale destinatario del proprio contenuto proprio l'interprete Javascript. In tal caso JXON si presenta come la via migliore.

Per tutto questo capitolo immagineremo di aver parsato, come al solito nella nostra variabile doc, questo documento XML di esempio:

esempio.xml

<?xml version="1.0"?>
<!DOCTYPE catalog SYSTEM "catalog.dtd">
<catalog>
   <product description="Cardigan Sweater">
      <catalog_item gender="Men's">
         <item_number>QWZ5671</item_number>
         <price>39.95</price>
         <size description="Medium">
            <color_swatch image="red_cardigan.jpg">Red</color_swatch>
            <color_swatch image="burgundy_cardigan.jpg">Burgundy</color_swatch>
         </size>
         <size description="Large">
            <color_swatch image="red_cardigan.jpg">Red</color_swatch>
            <color_swatch image="burgundy_cardigan.jpg">Burgundy</color_swatch>
         </size>
      </catalog_item>
      <catalog_item gender="Women's">
         <item_number>RRX9856</item_number>
         <discount_until>Dec 25, 1995</discount_until>
         <price>42.50</price>
         <size description="Medium">
            <color_swatch image="black_cardigan.jpg">Black</color_swatch>
         </size>
      </catalog_item>
   </product>
   <script type="text/javascript"><![CDATA[function matchwo(a,b) {
    if (a < b && a < 0) { return 1; }
    else { return 0; }
}]]></script>
</catalog>

Adesso proveremo a ottenere una rappresentazione della variabile doc — l'albero DOM — attraverso un albero di oggetti Javascript (per approfondire si leggano le guide su come lavorare con gli oggetti e su come Javascript sia Object-Oriented). Per far ciò potremo utilizzare diversi algoritmi di conversione.

Per semplicità gli algoritmi qui proposti (si veda: #1, #2, #3, #4) prenderanno in considerazione unicamente i seguenti tipi di nodi e i loro attributi:

Document (solo come argomento della funzione),
DocumentFragment (solo come argomento della funzione),
Element,
Text (mai come argomento della funzione),
CDATASection (mai come argomento della funzione).

Si tratta di un buon compromesso per un uso Javascript, dacché la gran parte delle informazioni di un documento XML è contenuta in questo tipo di nodi. Ogni altra informazione (come processing instructions, xml schemas, commenti, etc.) andrà persa. Allo scopo di evitare conflitti, la lettura dei nomi dei nodi e dei loro attributi è case insensitive (resa sempre in minuscolo) e di conseguenza le proprietà locali dell'albero di oggetti così ottenuto, aggiunte via JavaScript, dovranno avere sempre un qualche tipo di lettera maiuscola al loro interno (per evitare di sovrascrivere le proprietà ottenute dal foglio XML), come si può vedere di seguito. I seguenti algoritmi sono liberamente basati sulla Convenzione di Parker, versione 0.4, che prevede il riconoscimento del typeof del contenuto di testo di ogni singolo nodo letto.

Algoritmo #1: una via prolissa

Questo semplice costruttore ricorsivo converte un albero DOM XML in un albero di oggetti Javascript. Il contenuto di testo di ogni nodo sarà salvato all'interno della proprietà keyValue, mentre i nodeAttributes, se esistono, saranno annidati come proprietà dell'oggetto-figlio keyAttributes. L'argomento del costruttore potrà essere l'intero Document, un DocumentFragment o, più semplicemente, un nodo di tipo Element di esso.

function buildValue(sValue) {
  if (/^\s*$/.test(sValue)) { return null; }
  if (/^(true|false)$/i.test(sValue)) { return sValue.toLowerCase() === "true"; }
  if (isFinite(sValue)) { return parseFloat(sValue); }
  if (isFinite(Date.parse(sValue))) { return new Date(sValue); }
  return sValue;
}

function JXONData (oXMLParent) {
  var nAttrLen = 0, nLength = 0, sCollectedTxt = "";
  // children
  if (oXMLParent.hasChildNodes()) {
    for (var oItChild, sItKey, sItVal, nChildId = 0; nChildId < oXMLParent.childNodes.length; nChildId++) {
      oItChild = oXMLParent.childNodes.item(nChildId);
      if ((oItChild.nodeType + 1 | 1) === 5) { sCollectedTxt += oItChild.nodeType === 3 ? oItChild.nodeValue.replace(/^\s+|\s+$/g, "") : oItChild.nodeValue; } // nodeType is "Text" (3) or "CDATASection" (4)
      else if (oItChild.nodeType === 1 && !oItChild.prefix) { // nodeType is "Element" (1)
        sItKey = oItChild.nodeName.toLowerCase();
        sItVal = new JXONData(oItChild);
        if (this.hasOwnProperty(sItKey)) {
          if (this[sItKey].constructor !== Array) { this[sItKey] = [this[sItKey]]; }
          this[sItKey].push(sItVal);
        } else { this[sItKey] = sItVal; nLength++; }
      }
    }
    this.keyValue = buildValue(sCollectedTxt);
  } else { this.keyValue = null; }
  // node attributes
  if (oXMLParent.hasAttributes()) {
    var oItAttr;
    this.keyAttributes = {};
    for (nAttrLen; nAttrLen < oXMLParent.attributes.length; nAttrLen++) {
      oItAttr = oXMLParent.attributes.item(nAttrLen);
      this.keyAttributes[oItAttr.nodeName.toLowerCase()] = buildValue(oItAttr.nodeValue);
    }
  }
  // optional properties and methods; you could safely adjoust/remove them...
  this.keyLength = nLength;
  this.attributesLength = nAttrLen;
  // this.DOMNode = oXMLParent;
  this.valueOf = function() { return this.keyValue; };
  this.toString = function() { return String(this.keyValue); };
  this.getItem = function(nItem) {
    if (nLength === 0) { return null; }
    var iItem = 0;
    for (var sKeyName in this) { if (iItem === nItem) { return this[sKeyName]; } iItem++; }
    return null;
  };
  this.getAttribute = function(nAttrib) {
    if (nAttrLen === 0 || nAttrib + 1 > nAttrLen) { return null; }
    var nItAttr = 0;
    for (var sAttrName in this.keyAttributes) { if (nItAttr === nAttrib) { return this.keyAttributes[sAttrName]; } nItAttr++; }
    return null;
  };
  this.hasChildren = function() { return this.keyLength > 0; };
}

var myObject = new JXONData(doc);
// abbiamo ottenuto il nostro albero di oggetti Javascript! provare per credere: alert(JSON.stringify(myObject));

Con questo algoritmo il nostro esempio diventerà:

{
  "catalog": {
    "product": {
      "catalog_item": [{
        "item_number": {
          "keyValue": "QWZ5671",
          "keyLength": 0,
          "attributesLength": 0
        },
        "price": {
          "keyValue": 39.95,
          "keyLength": 0,
          "attributesLength": 0
        },
        "size": [{
          "color_swatch": [{
            "keyValue": "Red",
            "keyAttributes": {
              "image": "red_cardigan.jpg"
            },
            "keyLength": 0,
            "attributesLength": 1
          }, {
            "keyValue": "Burgundy",
            "keyAttributes": {
              "image": "burgundy_cardigan.jpg"
            },
            "keyLength": 0,
            "attributesLength": 1
          }],
          "keyValue": null,
          "keyAttributes": {
            "description": "Medium"
          },
          "keyLength": 1,
          "attributesLength": 1
        }, {
          "color_swatch": [{
            "keyValue": "Red",
            "keyAttributes": {
              "image": "red_cardigan.jpg"
            },
            "keyLength": 0,
            "attributesLength": 1
          }, {
            "keyValue": "Burgundy",
            "keyAttributes": {
              "image": "burgundy_cardigan.jpg"
            },
            "keyLength": 0,
            "attributesLength": 1
          }],
          "keyValue": null,
          "keyAttributes": {
            "description": "Large"
          },
          "keyLength": 1,
          "attributesLength": 1
        }],
        "keyValue": null,
        "keyAttributes": {
          "gender": "Men's"
        },
        "keyLength": 3,
        "attributesLength": 1
      }, {
        "item_number": {
          "keyValue": "RRX9856",
          "keyLength": 0,
          "attributesLength": 0
        },
        "discount_until": {
          "keyValue": new Date(1995, 11, 25),
          "keyLength": 0,
          "attributesLength": 0
        },
        "price": {
          "keyValue": 42.5,
          "keyLength": 0,
          "attributesLength": 0
        },
        "size": {
          "color_swatch": {
            "keyValue": "Black",
            "keyAttributes": {
              "image": "black_cardigan.jpg"
            },
            "keyLength": 0,
            "attributesLength": 1
          },
          "keyValue": null,
          "keyAttributes": {
            "description": "Medium"
          },
          "keyLength": 1,
          "attributesLength": 1
        },
        "keyValue": null,
        "keyAttributes": {
          "gender": "Women's"
        },
        "keyLength": 4,
        "attributesLength": 1
      }],
      "keyValue": null,
      "keyAttributes": {
        "description": "Cardigan Sweater"
      },
      "keyLength": 1,
      "attributesLength": 1
    },
    "script": {
      "keyValue": "function matchwo(a,b) {\n  if (a < b && a < 0) { return 1; }\n  else { return 0; }\n}",
      "keyAttributes": {
        "type": "text/javascript"
      },
      "keyLength": 0,
      "attributesLength": 1
    },
    "keyValue": null,
    "keyLength": 2,
    "attributesLength": 0
  },
  "keyValue": null,
  "keyLength": 1,
  "attributesLength": 0
}

È un approccio raccomandato nel caso in cui ci sia completamente ignota la struttura del documento XML che andremo a leggere.

Algoritmo #2: una via un po' meno prolissa

Quello che segue è un altro, più semplice, metodo di conversione. Dove i nodeAttributes saranno annidati nello stesso oggetto contenente la trascrizione dei nodi figli sebbene, a differenza di quelli, questi saranno contrassegnati dal prefisso “@”. Come sopra, il contenuto di testo di ciascun nodo sarà affidato alla proprietà keyValue. L'argomento del costruttore potrà essere l'intero Document, un DocumentFragment o, più semplicemente, un nodo di tipo Element di esso.

function buildValue(sValue) {
  if (/^\s*$/.test(sValue)) { return null; }
  if (/^(true|false)$/i.test(sValue)) { return sValue.toLowerCase() === "true"; }
  if (isFinite(sValue)) { return parseFloat(sValue); }
  if (isFinite(Date.parse(sValue))) { return new Date(sValue); }
  return sValue;
}

function JXONData (oXMLParent) {
  if (oXMLParent.hasChildNodes()) {
    var sCollectedTxt = "";
    for (var oItChild, sItKey, sItVal, nChildId = 0; nChildId < oXMLParent.childNodes.length; nChildId++) {
      oItChild = oXMLParent.childNodes.item(nChildId);
      if ((oItChild.nodeType + 1 | 1) === 5) { sCollectedTxt += oItChild.nodeType === 3 ? oItChild.nodeValue.replace(/^\s+|\s+$/g, "") : oItChild.nodeValue; }
      else if (oItChild.nodeType === 1 && !oItChild.prefix) {
        sItKey = oItChild.nodeName.toLowerCase();
        sItVal = new JXONData(oItChild);
        if (this.hasOwnProperty(sItKey)) {
          if (this[sItKey].constructor !== Array) { this[sItKey] = [this[sItKey]]; }
          this[sItKey].push(sItVal);
        } else { this[sItKey] = sItVal; }
      }
    }
    if (sCollectedTxt) { this.keyValue = buildValue(sCollectedTxt); }
  }
  if (oXMLParent.hasAttributes()) {
    var oItAttr;
    for (var iAttrId = 0; iAttrId < oXMLParent.attributes.length; iAttrId++) {
      oItAttr = oXMLParent.attributes.item(iAttrId);
      this["@" + oItAttr.nodeName.toLowerCase()] = buildValue(oItAttr.nodeValue);
    }
  }
}

var myObject = new JXONData(doc);
// abbiamo ottenuto il nostro albero di oggetti Javascript! provare per credere: alert(JSON.stringify(myObject));

Con questo algoritmo il nostro esempio diventerà:

{
  "catalog": {
    "product": {
      "catalog_item": [{
        "item_number": {
          "keyValue": "QWZ5671"
        },
        "price": {
          "keyValue": 39.95
        },
        "size": [{
          "color_swatch": [{
            "keyValue": "Red",
            "@image": "red_cardigan.jpg"
          }, {
            "keyValue": "Burgundy",
            "@image": "burgundy_cardigan.jpg"
          }],
          "@description": "Medium"
        }, {
          "color_swatch": [{
            "keyValue": "Red",
            "@image": "red_cardigan.jpg"
          }, {
            "keyValue": "Burgundy",
            "@image": "burgundy_cardigan.jpg"
          }],
          "@description": "Large"
        }],
        "@gender": "Men's"
      }, {
        "item_number": {
          "keyValue": "RRX9856"
        },
        "discount_until": {
          "keyValue": new Date(1995, 11, 25)
        },
        "price": {
          "keyValue": 42.5
        },
        "size": {
          "color_swatch": {
            "keyValue": "Black",
            "@image": "black_cardigan.jpg"
          },
          "@description": "Medium"
        },
        "@gender": "Women's"
      }],
      "@description": "Cardigan Sweater"
    },
    "script": {
      "keyValue": "function matchwo(a,b) {\n  if (a < b && a < 0) { return 1; }\n  else { return 0; }\n}",
      "@type": "text/javascript"
    }
  }
}

È un approccio possibile nel caso in cui ci sia parzialmente nota la struttura del documento XML che andremo a leggere.

Algoritmo #3: una via sintetica

Ora proveremo un altro metodo di conversione. Questo algoritmo è quello che si avvicina di più alla Convenzione di Parker. Esso è molto simile al precedente, eccetto che per il fatto che i nodi che non contengono alcun nodo-figlio di tipo Element, ma solo nodi-figli di tipo Text e/o CDATASection, non saranno rappresentati da oggetti, ma direttamente da booleani, numeri, stringhe o istanze del costruttore Date (si veda la Convenzione di Parker). La rappresentazione dei nodi completamente vuoti invece (cioè che non contengono né nodi di tipo Element, né nodi di tipo Text, né nodi di tipo CDATASection) avranno come valore predefinito true (su questo punto si vedano le Considerazioni sul codice). Inoltre questa volta non si è ricorso a un costruttore, ma a una funzione. L'argomento della funzione potrà essere l'intero Document, un DocumentFragment o, più semplicemente, un nodo di tipo Element di esso.

In molti casi questo rappresenta il metodo di conversione più pratico.

function buildValue(sValue) {
  if (/^\s*$/.test(sValue)) { return null; }
  if (/^(true|false)$/i.test(sValue)) { return sValue.toLowerCase() === "true"; }
  if (isFinite(sValue)) { return parseFloat(sValue); }
  if (isFinite(Date.parse(sValue))) { return new Date(sValue); }
  return sValue;
}

function getJXONData (oXMLParent) {
  var vResult = /* put here the default value for empty nodes! */ true, nLength = 0, sCollectedTxt = "";
  if (oXMLParent.hasAttributes()) {
    vResult = {};
    for (nLength; nLength < oXMLParent.attributes.length; nLength++) {
      oItAttr = oXMLParent.attributes.item(nLength);
      vResult["@" + oItAttr.nodeName.toLowerCase()] = buildValue(oItAttr.nodeValue.replace(/^\s+|\s+$/g, ""));
    }
  }
  if (oXMLParent.hasChildNodes()) {
    for (var oItChild, sItKey, sItVal, nChildId = 0; nChildId < oXMLParent.childNodes.length; nChildId++) {
      oItChild = oXMLParent.childNodes.item(nChildId);
      if (oItChild.nodeType === 4) { sCollectedTxt += oItChild.nodeValue; } /* nodeType is "CDATASection" (4) */
      else if (oItChild.nodeType === 3) { sCollectedTxt += oItChild.nodeValue.replace(/^\s+|\s+$/g, ""); } /* nodeType is "Text" (3) */
      else if (oItChild.nodeType === 1 && !oItChild.prefix) { /* nodeType is "Element" (1) */
         if (nLength === 0) { vResult = {}; }
        sItKey = oItChild.nodeName.toLowerCase();
        sItVal = getJXONData(oItChild);
        if (vResult.hasOwnProperty(sItKey)) {
          if (vResult[sItKey].constructor !== Array) { vResult[sItKey] = [vResult[sItKey]]; }
          vResult[sItKey].push(sItVal);
        } else { vResult[sItKey] = sItVal; nLength++; }
      }
     }
  }
  if (sCollectedTxt) { nLength > 0 ? vResult.keyValue = buildValue(sCollectedTxt) : vResult = buildValue(sCollectedTxt); }
  /* if (nLength > 0) { Object.freeze(vResult); } */
  return vResult;
}

var myObject = getJXONData(doc);
// abbiamo ottenuto il nostro albero di oggetti Javascript! provare per credere: alert(JSON.stringify(myObject));

Nota: Se si vuole congelare l'intero oggetto (a causa della natura "statica" di un documento XML), decommentare la stringa: /* if (nLength > 0) { Object.freeze(vResult); } */. Il metodo Object.freeze vieta l'aggiunta di nuove proprietà e la rimozione delle proprietà esistenti, congelando la loro enumerabilità, la loro configurabilità o la loro scrivibilità. In sostanza l'oggetto è reso effettivamente immutabile.

Con questo algoritmo il nostro esempio diventerà:

{
  "catalog": {
    "product": {
      "@description": "Cardigan Sweater",
      "catalog_item": [{
        "@gender": "Men's",
        "item_number": "QWZ5671",
        "price": 39.95,
        "size": [{
          "@description": "Medium",
          "color_swatch": [{
            "@image": "red_cardigan.jpg",
            "keyValue": "Red"
          }, {
            "@image": "burgundy_cardigan.jpg",
            "keyValue": "Burgundy"
          }]
        }, {
          "@description": "Large",
          "color_swatch": [{
            "@image": "red_cardigan.jpg",
            "keyValue": "Red"
          }, {
            "@image": "burgundy_cardigan.jpg",
            "keyValue": "Burgundy"
          }]
        }]
      }, {
        "@gender": "Women's",
        "item_number": "RRX9856",
        "discount_until": new Date(1995, 11, 25),
        "price": 42.5,
        "size": {
          "@description": "Medium",
          "color_swatch": {
            "@image": "black_cardigan.jpg",
            "keyValue": "Black"
          }
        }
      }]
    },
    "script": {
      "@type": "text/javascript",
      "keyValue": "function matchwo(a,b) {\n  if (a < b && a < 0) { return 1; }\n  else { return 0; }\n}"
    }
  }
}

È un approccio raccomandato nel caso in cui ci sia nota la struttura del documento XML che andremo a leggere.

Algoritmo #4: una via davvero minimalista

La seguente rappresenta un'altra possibile via di conversione. Anch'essa è molto vicina alla Convenzione di Parker. Con questo algoritmo la rappresentazione dei nodi di tipo Element che contengono a loro volta sullo stesso piano nodi-figli di tipo Element insieme con nodi-figli di tipo Text e/o di tipo CDATASection verrà resa per mezzo di istanze dei costruttori Boolean, Number, String, e Date. E di conseguenza la trascrizione di ogni eventuale nodo-figlio sarà annidata in oggetti di questo tipo.

Per esempio;

<employee type="usher">John Smith</employee>
<manager>Lisa Carlucci</manager>

diventerà

var myObject = {
  "employee": new String("John Smith"),
  "manager": "Lisa Carlucci"
};

myObject.employee["@type"] = "usher";

// test

alert(myObject.manager); // "Lisa Carlucci"
alert(myObject.employee["@type"]); // "usher"
alert(myObject.employee); // "John Smith"

Come per il terzo algoritmo, i nodi che non contengono alcun nodo-figlio di tipo Element, ma solo nodi-figli di tipo Text e/o CDATASection, non saranno rappresentati da oggetti, ma direttamente da booleani, numeri, stringhe (valori primitivi) o da istanze del costruttore Date (si veda la Convenzione di Parker). Come per il terzo algoritmo, non si è usato un costruttore, ma una semplice funzione. L'argomento della funzione potrà essere l'intero Document, un DocumentFragment o, più semplicemente, un nodo di tipo Element di esso.

function buildValue (sValue) {
  if (/^\s*$/.test(sValue)) { return null; }
  if (/^(true|false)$/i.test(sValue)) { return sValue.toLowerCase() === "true"; }
  if (isFinite(sValue)) { return parseFloat(sValue); }
  if (isFinite(Date.parse(sValue))) { return new Date(sValue); }
  return sValue;
}

function objectify (vValue) {
  if (vValue === null) {
    return new (function() {
      this.toString = function() { return "null"; }
      this.valueOf = function() { return null; }
    })();
  }
  return vValue instanceof Object ? vValue : new vValue.constructor(vValue);
}

var aTmpEls = []; // loaded element nodes cache

function getJXONData (oXMLParent) {
  var  sItKey, sItVal, vResult, nLength = 0, nLevelStart = aTmpEls.length,
       nChildren = oXMLParent.hasChildNodes() ? oXMLParent.childNodes.length : 0, sCollectedTxt = "";

  for (var oItChild, nChildId = 0; nChildId < nChildren; nChildId++) {
    oItChild = oXMLParent.childNodes.item(nChildId);
    if (oItChild.nodeType === 4) { sCollectedTxt += oItChild.nodeValue; } /* nodeType is "CDATASection" (4) */
    else if (oItChild.nodeType === 3) { sCollectedTxt += oItChild.nodeValue.replace(/^\s+|\s+$/g, ""); } /* nodeType is "Text" (3) */
    else if (oItChild.nodeType === 1 && !oItChild.prefix) { aTmpEls.push(oItChild); } /* nodeType is "Element" (1) */
  }

  var nLevelEnd = aTmpEls.length, vBuiltVal = buildValue(sCollectedTxt);

  if (oXMLParent.hasAttributes()) {
    vResult = objectify(vBuiltVal);
    for (nLength; nLength < oXMLParent.attributes.length; nLength++) {
      oItAttr = oXMLParent.attributes.item(nLength);
      vResult["@" + oItAttr.nodeName.toLowerCase()] = buildValue(oItAttr.nodeValue.replace(/^\s+|\s+$/g, ""));
    }
  } else if (nLevelEnd > nLevelStart) { vResult = objectify(vBuiltVal); }

  for (var nElId = nLevelStart; nElId < nLevelEnd; nElId++) {
    sItKey = aTmpEls[nElId].nodeName.toLowerCase();
    sItVal = getJXONData(aTmpEls[nElId]);
    if (vResult.hasOwnProperty(sItKey)) {
    if (vResult[sItKey].constructor !== Array) { vResult[sItKey] = [vResult[sItKey]]; }
      vResult[sItKey].push(sItVal);
    } else { vResult[sItKey] = sItVal; nLength++; }
  }

  aTmpEls.length = nLevelStart;

  if (nLength === 0) { vResult = sCollectedTxt ? vBuiltVal : /* put here the default value for empty nodes: */ true; }
  /* else { Object.freeze(vResult); } */

  return vResult;
}

var myObject = getJXONData(doc);
alert(myObject.catalog.product.catalog_item[1].size.color_swatch["@image"]); // "black_cardigan.jpg"
alert(myObject.catalog.product.catalog_item[1].size.color_swatch); // "Black" !

Nota: Se si vuole congelare l'intero oggetto (a causa della natura "statica" di un documento XML), decommentare la stringa: /* else { Object.freeze(vResult); } */ . Il metodo Object.freeze vieta l'aggiunta di nuove proprietà e la rimozione delle proprietà esistenti, congelando la loro enumerabilità, la loro configurabilità o la loro scrivibilità. In sostanza l'oggetto è reso effettivamente immutabile.

È un approccio possibile nel caso in cui ci sia nota la struttura del documento XML che andremo a leggere.

Algoritmi inversi

È possibile invertire gli algoritmi qui proposti in maniera tale da ottenere un nuovo documento XML a partire da un albero di oggetti Javascript.

Per semplicità proporremo qui un unico esempio, che in un unico codice rappresenta l'inversione degli algoritmi #2 e #3. È molto semplice partire da esso per creare gli inversi anche degli algoritmi #1 e #4, qualora se ne abbia la necessità.

function createXML (oJXONObj) {
  function loadObj (oParentObj, oParentEl) {
    var nSameIdx, vValue, oChild;
    for (var sName in oParentObj) {
      vValue = oParentObj[sName];
      if (sName === "keyValue") {
        if (vValue !== null && vValue !== true) { oParentEl.appendChild(oNewDoc.createTextNode(String(vValue))); }
      } else if (sName.charAt(0) === "@") {
        oParentEl.setAttribute(sName.slice(1), vValue);
      } else {
        oChild = oNewDoc.createElement(sName);
        if (vValue.constructor === Date) {
          oChild.appendChild(oNewDoc.createTextNode(vValue.toGMTString()));
        } else if (vValue.constructor === Array) {
          for (nSameIdx = 0; nSameIdx < vValue.length; nSameIdx++) { loadObj(vValue[nSameIdx], oChild); }
        } else if (vValue instanceof Object) {
          loadObj(vValue, oChild);
        } else if (vValue !== null && vValue !== true) {
          oChild.appendChild(oNewDoc.createTextNode(vValue.toString()));
        }
        oParentEl.appendChild(oChild);
      }
    }
  }
  var oNewDoc = document.implementation.createDocument("", "", null);
  loadObj(oJXONObj, oNewDoc);
  return oNewDoc;
}

var newDoc = createXML(myObject);
// abbiamo ottenuto il nostro documento! provare per credere: alert((new XMLSerializer()).serializeToString(newDoc));

Nota: Con questo codice le istanze di Date eventualmente presenti verranno convertite in stringhe attraverso l'invocazione del metodo toGMTString(). Nulla vieta l'utilizzo di qualsiasi altro metodo di conversione. Inoltre le proprietà dell'albero con valore uguale a true verranno convertite in elementi privi di nodi di testo (si vedano le Considerazioni sul codice).

Si tratta di una buona soluzione nel caso si voglia automatizzare la creazione di un documento XML. È una cattiva scelta invece nel caso in cui si voglia ricostruire un documento XML già precedentemente convertito in JSON. Sebbene la conversione sia molto fedele (eccetto che per i nodi di tipo CDATASection, che verranno riconvertiti in nodi di tipo Text), si tratta di un processo inutilmente dispendioso. Nel caso infatti in cui il nostro scopo sia quello di modificare un documento XML, si raccomanda vivamente di lavorare su di esso invece che di crearne di nuovi.

La Convenzione di Parker

Le funzioni precedentemente elencate per la conversione di un documento XML in JSON (spesso chiamate «algoritmi JXON») sono più o meno liberamente basate sulla Convenzione di Parker. È chiamata “Convenzione di Parker”, in opposizione alla “Convenzione di BadgerFish”, sulla falsa riga del fumetto di Cuadrado Parker & Badger. Per ulteriori approfondimenti si veda anche la Convenzione di BadgerFish.

La seguente è una traduzione dall'inglese del paper originale della Convenzione di Parker (versione 0.4), dalla pagina “TransformingRules” del sito del progetto xml2json-xslt.

Questa convenzione è stata scritta per regolamentare la conversione in JSON da parte di XSLT, di conseguenza alcune parti di essa sono futili per Javascript.

Conversione in JSON

L'elemento root verrà assorbito, poiché ce ne può essere soltanto uno:
```
<root>test</root>
```
diventerà
```
"test"
```

I nomi degli elementi diventeranno proprietà di oggetti:

<root><name>Xml</name><encoding>ASCII</encoding></root>

diventerà

{
  "name": "Xml",
  "encoding": "ASCII"
}

I numeri saranno riconosciuti come tali (sia interi che decimali):

<root><age>12</age><height>1.73</height></root>

diventerà

{
  "age": 12,
  "height": 1.73
}

I booleani saranno riconosciuti come tali (case insensitive):

<root><checked>True</checked><answer>FALSE</answer></root>

diventerà

{
  "checked": true,
  "answer": false
}

Le stringhe avranno degli escape quando ce ne sarà la necessità:
```
<root>Quote: &quot; New-line:
</root>
```
diventerà
```
"Quote: \" New-line:\n"
```
Gli elementi vuoti diventeranno proprietà con valore nullo (null):
```
<root><nil/><empty></empty></root>
```
diventerà
```
{
  "nil": null,
  "empty": null
}
```

If all sibling elements have the same name, they become an array

<root><item>1</item><item>2</item><item>three</item></root>

becomes

[1, 2, "three"]

Mixed mode text-nodes, comments and attributes get absorbed:

<root version="1.0">testing<!--comment--><elementtest="true">1</element></root>

becomes

{ "element": true }

Namespaces get absorbed, and prefixes will just be part of the property name:

<root xmlns:ding="https://zanstra.com/ding"><ding:dong>binnen</ding:dong></root>

becomes

{ "ding:dong" : "binnen" }

Note: Our algorithms comply with the points 2, 3, 4 and 7. The third and the fourth algorithm comply also with the point 6 (but true instead of null – si vedano le Considerazioni sul codice). The point 5 is automatically managed by the Javascript method JSON.stringify.

Appendice Javascript

All the same as the JSON translation, but with these extra's:

Property names are only escaped when necessary

<root><while>true</while><wend>false</wend><only-if/></root>

becomes

{
  "while": true,
  wend: false,
  "only-if": null
}

Within a string, closing elements "</" are escaped as "<\/"

<root><![CDATA[<script>alert("YES");</script>]]></root>

becomes

{ script: "<script>alert(\"YES\")<\/script>" }

Dates are created as new Date() objects

<root>2006-12-25</root>

becomes

new Date(2006, 12 - 1, 25)

Attributes and comments are shown as comments (for testing-purposes):

<!--testing--><root><test version="1.0">123</test></root>

becomes

/* testing */ { test /* @version = "1.0" */ : 123}

A bit of indentation is done, to keep things ledgible

Note: Our algorithms comply with the point 3 (but without month decrease). The points 1 and 2 are automatically managed by the Javascript method JSON.stringify.

In sintesi

Prendiamo il terzo algoritmo come l'algoritmo di conversione JXON più rappresentativo. Un singolo nodo XML di tipo Element può avere in totale otto differenti configurazioni a seconda di quello che contiene. Esso può essere:

un elemento vuoto,
un elemento contenente al suo interno solamente un nodo di testo,
un elemento vuoto ma contenente attributi,
un elemento con attributi contenente al suo interno solamente un nodo di testo,
un elemento contenente ulteriori elementi-figli con nomi diversi,
un elemento contenente ulteriori elementi-figli con nomi uguali,
un elemento contenente ulteriori elementi-figli e un unico nodo di testo (testo contiguo),
un elemento contenente ulteriori elementi-figli e più nodi di testo (testo non contiguo).

The following table shows the corresponding conversion patterns between XML and JSON according to the third algorithm.

Case	XML	JSON	Javascript access
1	`<animal/>`	`"animal": true`	`myObject.animal`
2	`<animal>text</animal>`	`"animal": "text"`	`myObject.animal`
3	`<animal name="value" />`	`"animal": {"@name": "value"}`	`myObject.animal["@name"]`
4	`<animal name="value">text</animal>`	`"animal": { "@name": "value", "keyValue": "text" }`	`myObject.animal["@name"]`, `myObject.animal.keyValue`
5	`<animal> <dog>Charlie</dog> <cat>Deka</cat> </animal>`	`"animal": { "dog": "Charlie", "cat": "Deka" }`	`myObject.animal.dog`, `myObject.animal.cat`
6	`<animal> <dog>Charlie</dog> <dog>Mad Max</dog> </animal>`	`"animal": { "dog": ["Charlie", "Mad Max"] }`	`myObject.animal.dog[0]`, `myObject.animal.dog[1]`
7	`<animal> in my house <dog>Charlie</dog> </animal>`	`"animal": { "keyValue": "in my house", "dog": "Charlie" }`	`myObject.animal.keyValue`, `myObject.animal.dog`
8	`<animal> in my ho <dog>Charlie</dog> use </animal>`	`"animal": { "keyValue": "in my house", "dog": "Charlie" }`	`myObject.animal.keyValue`, `myObject.animal.dog`

Considerazioni sul codice

In these examples we chose to use a property named keyValue for the text content. The lack of standars for XML to JSON conversion leads developers to choose several property names for the text content of XML Element nodes which contain also other child nodes. Sometimes it is used a property called $. Other times it is used a property called #text. In the algorithms proposed here you can easily change this name, depending on your needs.

The choice of using a true value instead of a null value to represent empty nodes is due to the fact that when in an XML document there is an empty node the reason is often to express a Boolean content, as in this case:

<car>
  <type>Ferrari</type>
  <bought />
</car>

If the value were null it would be more cumbersome to launch a code like this:

if (myObject.car.bought) {
  // do something
}

According to our terzo algoritmo and our quarto algoritmo, just Text nodes or CDATASection nodes which contain nothing but white spaces (precisely: /^\s+$/) are parsed as null.

An important consideration is that, using the third or the fourth algorithm, an XML Document can be used to create any type of Javascript object. For example, If you want to create an object like the following:

{
  "bool": true,
  "array": ["Cinema", "Hot dogs", false],
  "object": {
    "nickname": "Jack",
    "registration_date": new Date(1995, 11, 25),
    "privileged_user": true
  },
  "num": 99,
  "text": "Hello World!"
}

you must just create an XML document with the following structure:

<bool>true</bool>
<array>Cinema</array>
<array>Hot dogs</array>
<array>false</array>
<object>
  <nickname>Jack</nickname>
  <registration_date>Dec 25, 1995</registration_date>
  <privileged_user />
</object>
<num>99</num>
<text>Hello World!</text>

This example also shows how the ideal JXON document is an XML document designed specifically to be converted in JSON format.

Costruire file a partire da istanze di `Document`

First, create a DOM tree as described in the Come creare un albero DOM article. If you have already have a DOM tree from using XMLHttpRequest, skip to the end of this section.

Now, let's serialize doc — the DOM tree — to a file (you can read more about using files in Mozilla):

var oFOStream = Components.classes["@mozilla.org/network/file-output-stream;1"].createInstance(Components.interfaces.nsIFileOutputStream);
var oFile = Components.classes["@mozilla.org/file/directory_service;1"].getService(Components.interfaces.nsIProperties).get("ProfD", Components.interfaces.nsILocalFile); // get profile folder
oFile.append("extensions"); // extensions sub-directory
oFile.append("{5872365E-67D1-4AFD-9480-FD293BEBD20D}"); // GUID of your extension
oFile.append("myXMLFile.xml"); // filename
oFOStream.init(oFile, 0x02 | 0x08 | 0x20, 0664, 0); // write, create, truncate
(new XMLSerializer()).serializeToStream(doc, oFOStream, ""); // rememeber, doc is the DOM tree
oFOStream.close();

Costruire file a partire da istanze di `XMLHttpRequest`

If you already have a DOM tree from using XMLHttpRequest, use the same code as above but replace serializer.serializeToStream(doc, oFOStream, "") with serializer.serializeToStream(xmlHttpRequest.responseXML.documentElement, oFOStream, "") where xmlHttpRequest is an instance of XMLHttpRequest.

Note that this first parses the XML retrieved from the server, then re-serializes it into a stream. Depending on your needs, you could just save the xmlHttpRequest.responseText directly.

Resources

Parsing and Serializing XML su XUL Planet

Costruire e decostruire un documento XML

Che cos'è un albero DOM?

Prima parte: costruire un albero DOM

Creare dinamicamente un albero DOM

Scrivendolo a mano

Automatizzando la creazione dinamica dell'albero DOM

Costruire un albero DOM XML a partire da stringhe di codice sorgente

Costruire un albero DOM a partire da un file

Usando DOMParser

Usando XMLHttpRequest

Usando l'elemento <object>.

Seconda parte: decostruire un albero DOM

Convertire un documento XML in stringhe di codice sorgente

Come ottenere stringhe di codice sorgente di facile lettura

Convertire un foglio XML in un albero di oggetti Javascript (JXON)

esempio.xml

Algoritmo #1: una via prolissa

Algoritmo #2: una via un po' meno prolissa

Algoritmo #3: una via sintetica

Algoritmo #4: una via davvero minimalista

Algoritmi inversi

La Convenzione di Parker

Conversione in JSON

Appendice Javascript

In sintesi

Considerazioni sul codice

Costruire file a partire da istanze di Document

Costruire file a partire da istanze di XMLHttpRequest

Resources

Tag del documento e collaboratori

Usando `DOMParser`

Usando `XMLHttpRequest`

Usando l'elemento `<object>`.

Costruire file a partire da istanze di `Document`

Costruire file a partire da istanze di `XMLHttpRequest`