A while ago I posted a ‘highlight’ script that could be used to highlight certain matches within a document. It uses a regular expression to replace the innerHTML property of the specified container. Since then, because of this comment and various other things I’ve read, I’ve come to realize that it’s just not a solid solution and doesn’t cut it for realistically complicated websites.

The only viable solution is to progressively walk the DOM tree, and only stop for text nodes (nodeType = 3), and then apply the conventional ‘replace’ to each of those nodes.

The process is as follows:

  1. Loop through child nodes of target node (container).
  2. On each iteration, check that it’s a text node; if it’s not then call the function again with the encountered node specified as the new ‘searchNode’ (the process begins again). If it is a text node then continue.
  3. Check for the match (‘searchText’) – if a match exists then replace all occurances with the return value of the ‘replacement’ callback. If a match does not exist then continue on to the next node.
  4. The resulting string, with HTML in it, is injected, via innerHTML, into a newly created DIV element.
  5. Each child of the DIV element is then added, one by one, to a document fragment.
  6. The document fragment is inserted before the current node which is subsequently removed.
  7. The loop continues, until all child nodes of the ‘searchNode’ have been searched.

Here’s the function itself (download here):

function findAndReplace(searchText, replacement, searchNode) {
    if (!searchText || typeof replacement === 'undefined') {
        // Throw error here if you want...
        return;
    }
    var regex = typeof searchText === 'string' ?
                new RegExp(searchText, 'g') : searchText,
        childNodes = (searchNode || document.body).childNodes,
        cnLength = childNodes.length,
        excludes = 'html,head,style,title,link,meta,script,object,iframe';
    while (cnLength--) {
        var currentNode = childNodes[cnLength];
        if (currentNode.nodeType === 1 &&
            (excludes + ',').indexOf(currentNode.nodeName.toLowerCase() + ',') === -1) {
            arguments.callee(searchText, replacement, currentNode);
        }
        if (currentNode.nodeType !== 3 || !regex.test(currentNode.data) ) {
            continue;
        }
        var parent = currentNode.parentNode,
            frag = (function(){
                var html = currentNode.data.replace(regex, replacement),
                    wrap = document.createElement('div'),
                    frag = document.createDocumentFragment();
                wrap.innerHTML = html;
                while (wrap.firstChild) {
                    frag.appendChild(wrap.firstChild);
                }
                return frag;
            })();
        parent.insertBefore(frag, currentNode);
        parent.removeChild(currentNode);
    }
}

See a demonstration!

No library or framework is required to use this function, it’s entirely stand-alone. The function requires two parameters, the third one is optional:

  • searchText – This can either be a string or a regular expression. Either way, it will eventually become a RegExp object. So, if you wanted to search for the word “and” then that alone would not be appropriate – all words that contain “and” would be matched so you need to use either the string, \\band\\b or the regular expression, /\band\b/g to test for word boundaries. (remember the global flag)
  • replacement – This parameter will be directly passed to the String.replace function, so you can either have a string replacement (using $1, $2, $3 etc. for backreferences) or a function.
  • searchNode – This parameter is mainly for internal usage but you can, if you so desire, specify the node under which the search will take place. By default it’s set to document.body.

A typical example would be when highlighting search keywords, here’s how that would work:

// Just an example:
var searchMatch = document.referrer.match(/[?&]q=([^&]+)/),
    searchTerm = searchMatch && searchMatch[1];
if (searchTerm) {
    findAndReplace('\\b' + searchTerm + '\\b', function(term){
        return '<span class="keyword">' + term + '</span>';
    });
}

As I said, a string can be passed as the second parameter and you can use ‘$1, $2 etc.’ for backreferences:

findAndReplace('(microsoft|apple|sony)', '<a href="http://$1.com">$1</a>');

You’ll notice that within the function there’s an ‘excludes’ string that contains a comma-seperated list of node-names to exclude from all searches. You can add and take away from this list as needed.

Porting over to MooTools or jQuery is quite pointless because neither library offers anything in the way of text node traversal, but feel free to wrap it all up in the respective namespace.

One notable limitation is that the function cannot search for text nested between seperate nodes, for example, searching for “pineapple” in the following HTML would not work:

We ate mango, pine<strong>apple</strong> and passion fruit!!

I’ve tried to find ways around this but it seems a lost cause.