Restore exact innerHTML to DOM

I’d like to save the html string of the DOM, and later restore it to be exactly the same. The code looks something like this:

var stringified = document.documentElement.innerHTML
// later, after serializing and deserializing
document.documentElement.innerHTML = stringified

This works when everything is perfect, but when the DOM is not w3c-comliant, there’s a problem. The first line works fine, stringified matches the DOM exactly. But when I restore from the (non-w3c-compliant) stringified, the browser does some magic and the resulting DOM is not the same as it was originally.

  • What kind of object is exactly returning 'this' in a Jquery's .each?
  • How to correctly use innerHTML to create an element (with possible children) from a html string?
  • Why am I getting 0 (zero) when accessing a numeric attribute of a nodelist generated by getElementsByTagName?
  • Javascript access TR from TD
  • Explanation on window.getComputedStyle and why Chrome handles it differently
  • Can beforeunload/unload be used to send XmlHttpRequests reliably
  • For example, if my original DOM looks like

    <p><div></div></p>
    

    then the final DOM will look like

    <p></p><div></div><p></p>
    

    since div elements are not allowed to be inside p elements. Is there some way I can get the browser to use the same html parsing that it does on page load and accept broken html as-is?

    Why is the html broken in the first place? The DOM is not controlled by me.

    Here’s a jsfiddle to show the behavior http://jsfiddle.net/b2x7rnfm/5/. Open your console.

    <body>
        <div id="asdf"><p id="outer"></p></div>
        <script type="text/javascript">
            var insert = document.createElement('div');
            var text = document.createTextNode('ladygaga');
            insert.appendChild(text);
            document.getElementById('outer').appendChild(insert);
            var e = document.getElementById('asdf')
            console.log(e.innerHTML);
            e.innerHTML = e.innerHTML;
            console.log(e.innerHTML); // This is different than 2 lines above!!
        </script>
    </body>
    

  • How to copy a DOM node with event listeners?
  • Google Maps JS v3 - detached DOM tree - memory leak?
  • React - get React component from a child DOM element?
  • Is there a cross-browser solution for getSelection()?
  • element.dataset in Internet Explorer
  • Scroll to a certain location using a Dojo ContentPanel
  • 7 Solutions collect form web for “Restore exact innerHTML to DOM”

    If you need to be able to save and restore an invalid HTML structure, you could do it by way of XML. The code which follows comes from this fiddle.

    To save, you create a new XML document to which you add the nodes you want to serialize:

    var asdf = document.getElementById("asdf");
    var outer = document.getElementById("outer");
    var add = document.getElementById("add");
    var save = document.getElementById("save");
    var restore = document.getElementById("restore");
    
    var saved = undefined;
    save.addEventListener("click", function () {
      if (saved !== undefined)
        return; /// Do not overwrite
    
      // Create a fake document with a single top-level element, as 
      // required by XML.    
      var parser = new DOMParser();
      var doc = parser.parseFromString("<top/>", "text/xml");
    
      // We could skip the cloning and just move the nodes to the XML
      // document. This would have the effect of saving and removing 
      // at the same time but I wanted to show what saving while 
      // preserving the data would look like    
      var clone = asdf.cloneNode(true);
      var top = doc.firstChild;
      var child = asdf.firstChild;
      while (child) {
        top.appendChild(child);
        child = asdf.firstChild;
      }
      saved = top.innerHTML;
      console.log("saved as: ", saved);
    
      // Perform the removal here.
      asdf.innerHTML = "";
    });
    

    To restore, you create an XML document to deserialize what you saved and then add the nodes to your document:

    restore.addEventListener("click", function () {
      if (saved === undefined)
          return; // Don't restore undefined data!
    
      // We parse the XML we saved.
      var parser = new DOMParser();
      var doc = parser.parseFromString("<top>" + saved + "</top>", "text/xml");
      var top = doc.firstChild;
    
      var child = top.firstChild;
      while (child) {
        asdf.appendChild(child);
        // Remove the extra junk added by the XML parser.
        child.removeAttribute("xmlns");
        child = top.firstChild;
      }
      saved = undefined;
      console.log("inner html after restore", asdf.innerHTML);
    });
    

    Using the fiddle, you can:

    1. Press the “Add LadyGaga…” button to create the invalid HTML.

    2. Press “Save and Remove from Document” to save the structure in asdf and clear its contents. This prints to the console what was saved.

    3. Press “Restore” to restore the structure that was saved.

    The code above aims to be general. It would be possible to simplify the code if some assumptions can be made about the HTML structure to be saved. For instance blah is not a well-formed XML document because you need a single top element in XML. So the code above takes pains to add a top-level element (top) to prevent this problem. It is also generally not possible to just parse an HTML serialization as XML so the save operation serializes to XML.

    This is a proof-of-concept more than anything. There could be side-effects from moving nodes created in an HTML document to an XML document or the other way around that I have not anticipated. I’ve run the code above on Chrome and FF. I don’t have IE at hand to run it there.

    This won’t work for your most recent clarification, that you must have a string copy. Leaving it, though, for others who may have more flexibility.


    Since using the DOM seems to allow you to preserve, to some degree, the invalid structure, and using innerHTML involves reparsing with (as you’ve observed) side-effects, we have to look at not using innerHTML:

    You can clone the original, and then swap in the clone:

    var e = document.getElementById('asdf')
    snippet.log("1: " + e.innerHTML);
    var clone = e.cloneNode(true);
    var insert = document.createElement('div');
    var text = document.createTextNode('ladygaga');
    insert.appendChild(text);
    document.getElementById('outer').appendChild(insert);
    snippet.log("2: " + e.innerHTML);
    e.parentNode.replaceChild(clone, e);
    e = clone;
    snippet.log("3: " + e.innerHTML);
    

    Live Example:

    var e = document.getElementById('asdf')
    snippet.log("1: " + e.innerHTML);
    var clone = e.cloneNode(true);
    var insert = document.createElement('div');
    var text = document.createTextNode('ladygaga');
    insert.appendChild(text);
    document.getElementById('outer').appendChild(insert);
    snippet.log("2: " + e.innerHTML);
    e.parentNode.replaceChild(clone, e);
    e = clone;
    snippet.log("3: " + e.innerHTML);
    <div id="asdf">
      <p id="outer">
        <div>ladygaga</div>
      </p>
    </div>
    
    <!-- Script provides the `snippet` object, see http://meta.stackexchange.com/a/242144/134069 -->
    <script src="http://tjcrowder.github.io/simple-snippets-console/snippet.js"></script>

    Try utilizing Blob , URL.createObjectURL to export html ; include script tag in exported html which removes <div></div><p></p> elements from rendered html document

    html

    <body>
        <div id="asdf">
            <p id="outer"></p>
        </div>
        <script>
            var insert = document.createElement("div");
            var text = document.createTextNode("ladygaga");
            insert.appendChild(text);
            document.getElementById("outer").appendChild(insert);
            var elem = document.getElementById("asdf");
            var r = document.querySelectorAll("[id=outer] ~ *");
            // remove last `div` , `p` elements from `#asdf`
            for (var i = 0; i < r.length; ++i) {
                elem.removeChild(r[i])
            }
        </script>
    </body>
    

    js

    var e = document.getElementById("asdf");   
    var html = e.outerHTML;  
    console.log(document.body.outerHTML);   
    var blob = new Blob([document.body.outerHTML], {
        type: "text/html"
    });   
    var objUrl = window.URL.createObjectURL(blob);
    var popup = window.open(objUrl, "popup", "width=300, height=200");
    

    jsfiddle http://jsfiddle.net/b2x7rnfm/11/

    see this example: http://jsfiddle.net/kevalbhatt18/1Lcgaprc/

    MDN cloneNode

    var e = document.getElementById('asdf')
    console.log(e.innerHTML);
    backupElem = e.cloneNode(true);
    // Your tinkering with the original
    e.parentNode.replaceChild(backupElem, e);
    console.log(e.innerHTML);
    

    You can not expect HTML to be parsed as a non-compliant HTML. But since the structure of compiled non-compliant HTML is very predictable you can make a function which makes the HTML non-compliant again like this:

    function ruinTheHtml() {
    
    var allElements = document.body.getElementsByTagName( "*" ),
        next,
        afterNext;
    
    Array.prototype.map.call( allElements,function( el,i ){
    
        if( el.tagName !== 'SCRIPT' && el.tagName !== 'STYLE' ) {
    
            if(el.textContent === '') {
    
                next = el.nextSibling;
    
                afterNext = next.nextSibling;
    
                if( afterNext.textContent === '' ) {
    
                    el.parentNode.removeChild( afterNext );
                    el.appendChild( next );
    
                }
    
            }
    
        }
    });
    
    }
    

    See the fiddle:
    http://jsfiddle.net/pqah8e25/3/

    You have to clone the node instead of copying html. Parsing rules will force the browser to close p when seeing div.

    If you really need to get html from that string and it is valid xml, then you can use following code ($ is jQuery):

    var html = "<p><div></div></p>";
    var div = document.createElement("div");
    var xml = $.parseXML(html);
    div.appendChild(xml.documentElement);
    div.innerHTML === html // true
    

    You can use outerHTML, it perseveres the original structure:

    (based on your original sample)

    <div id="asdf"><p id="outer"></p></div>
    
    <script type="text/javascript">
        var insert = document.createElement('div');
        var text = document.createTextNode('ladygaga');
        insert.appendChild(text);
        document.getElementById('outer').appendChild(insert);
        var e = document.getElementById('asdf')
        console.log(e.outerHTML);
        e.outerHTML = e.outerHTML;
        console.log(e.outerHTML);
    </script>
    

    Demo: http://jsfiddle.net/b2x7rnfm/7