Introducing “SiteTraverser”

SiteTraverser is a JavaScript class1 which you can use to create bookmarklets that crawl websites looking for the presence or lack of certain features.

Here’s an example that utilises SiteTraverser (give it a click!):

Click to traverse!

All it does is list the titles of some pages on this site. The code behind the bookmarklet is quite simple:

var s = document.createElement('script'),
    t = setInterval(function(){
        if ( window.SiteTraverser ) {
            // Script is loaded
            // Let's start traversing
            new SiteTraverser({
                // Just for this demo, we're specifying the URLs.
                // Normally, SiteTraverser would just crawl in all directions.
                urls: ['/foo/1.html','/foo/2.html','/foo/3.html'],
                check: function(source, url){
                    var titleMatch, title;
                    if ( titleMatch = source.match(/<title>(.+?)<//) ) {
                        return this.success(
                            "Title found: " + '"' + titleMatch[1] + '"',
                            "URL: " + url
                    return this.failure("No title :(");
    }, 100);
s.src = '';

First, we load sitetraverser.js, and then we continue to instantiate (new SiteTraverser()) and run it (.go()).

More complex checks can be performed. For example, you could crawl a site looking for empty image sources (why you’d want to do this), or perhaps to look for unclosed tags, or instances of inline JavaScript or CSS. You could do a whole bunch of things actually, and it’s not just limited to string operations on the source; if you wanted, you can create a DOM structure from the source and run wild!

More information about SiteTraverser can be viewed on Github:

SiteTraverser on Github

1, I was quite reluctant to call it a “class”, since JavaScript doesn’t support classes as they’re commonly known. However, it appears to be the best-fit term in this situation.