Introducing “SiteTraverser”

SiteTraverser is a JavaScript class¹ which you can use to create bookmarklets that crawl websites looking for the presence or lack of certain features.

Here’s an example that utilises SiteTraverser (give it a click!):

Click to traverse!

All it does is list the titles of some pages on this site. The code behind the bookmarklet is quite simple:

var s = document.createElement('script'),
    t = setInterval(function(){
 
        if ( window.SiteTraverser ) {
 
            // Script is loaded
 
            clearInterval(t);
            s.parentNode.removeChild(s);
 
            // Let's start traversing
            new SiteTraverser({
 
                // Just for this demo, we're specifying the URLs.
                // Normally, SiteTraverser would just crawl in all directions.
                urls: ['/foo/1.html','/foo/2.html','/foo/3.html'],
 
                check: function(source, url){
 
                    var titleMatch, title;
 
                    if ( titleMatch = source.match(/<title>(.+?)<//) ) {
                        return this.success(
                            "Title found: " + '"' + titleMatch[1] + '"',
                            "URL: " + url
                        );
                    }
 
                    return this.failure("No title :(");
 
                }
            }).go();
 
        }
 
    }, 100);
 
s.src = 'http://qd9.co.uk/projects/SiteTraverser/sitetraverser.js';
document.body.appendChild(s);

First, we load sitetraverser.js, and then we continue to instantiate (new SiteTraverser()) and run it (.go()).

More complex checks can be performed. For example, you could crawl a site looking for empty image sources (why you’d want to do this), or perhaps to look for unclosed tags, or instances of inline JavaScript or CSS. You could do a whole bunch of things actually, and it’s not just limited to string operations on the source; if you wanted, you can create a DOM structure from the source and run wild!

More information about SiteTraverser can be viewed on Github:

SiteTraverser on Github

¹, I was quite reluctant to call it a “class”, since JavaScript doesn’t support classes as they’re commonly known. However, it appears to be the best-fit term in this situation.

Thanks for reading! Please share your thoughts with me on Twitter. Have a great day!

Supersha December 3rd, 2009 at 3:20 am

It’s perfect!! But why the “go” method run in the constructor? maybe it looks a little good.

Paul Irish December 3rd, 2009 at 3:03 pm

Most excellent.

My common use case for something like this is to help identify all of the A tags with only a # href.
As its very common for developers to use those with the faulty expectation of fixing later. 🙂

Mat December 3rd, 2009 at 3:57 pm

Very nice!
Drag&drop doesn’t work on Mac OS X since CTRL + clic is just like a right clic on mac.

James December 3rd, 2009 at 10:19 pm

@Supersha, thanks. The go() method is there to give you control over when the crawling begins. In some situations you’ll want to instantiate early but not start until a bit later. Plus, it allows you to go instance.go().stop().go() 😀

@Paul, let me know if you end up using it. 🙂

@mat, Ahh, forgot about Macs. I’ve changed it a bit, so it no longer uses CTRL to initiate dragging – you can now click anywhere on the top or edge of the box to drag it, without needing to press any keys.

Introducing “SiteTraverser”

So far there's been 4 Responses to “Introducing “SiteTraverser””

So far there's been 4 Responses to
“Introducing “SiteTraverser””