Posts Tagged ‘Comments’

JavaScript comment removal – revisited

Posted in 'Code Snippets, JavaScript' by James on September 11th, 2009

A while ago I posted a method I had been using at the time to remove comments from JavaScript code. It was pretty decent – instead of using a regular expression it steps through each character and removes comments where it finds them.

At the time I thought stepping through a string character-by-character was the only reliable way to solve the “comments problem” but after giving it another attempt I found that it was possible with a only a few regular expressions and a fairly moderate dose of JavaScript’s replace() function.

Here it is:

function removeComments(str) {
 
    var uid = '_' + +new Date(),
        primatives = [],
        primIndex = 0;
 
    return (
        str
        /* Remove strings */
        .replace(/(['"])(\\\1|.)+?\1/g, function(match){
            primatives[primIndex] = match;
            return (uid + '') + primIndex++;
        })
 
        /* Remove Regexes */
        .replace(/([^\/])(\/(?!\*|\/)(\\\/|.)+?\/[gim]{0,3})/g, function(match, $1, $2){
            primatives[primIndex] = $2;
            return $1 + (uid + '') + primIndex++;
        })
 
        /*
        - Remove single-line comments that contain would-be multi-line delimiters
            E.g. // Comment /* < --
        - Remove multi-line comments that contain would be single-line delimiters
            E.g. /* // <-- 
       */
        .replace(/\/\/.*?\/?\*.+?(?=\n|\r|$)|\/\*[\s\S]*?\/\/[\s\S]*?\*\//g, '')
 
        /*
        Remove single and multi-line comments,
        no consideration of inner-contents
       */
        .replace(/\/\/.+?(?=\n|\r|$)|\/\*[\s\S]+?\*\//g, '')
 
        /*
        Remove multi-line comments that have a replaced ending (string/regex)
        Greedy, so no inner strings/regexes will stop it.
       */
        .replace(RegExp('\\/\\*[\\s\\S]+' + uid + '\\d+', 'g'), '')
 
        /* Bring back strings & regexes */
        .replace(RegExp(uid + '(\\d+)', 'g'), function(match, n){
            return primatives[n];
        })
    );
 
}

Theoretically this should work perfectly in almost all situations. Don’t bother even trying it with E4X as that definitely won’t work! E.g.

var someE4X = <box>// this is NOT a comment</box>;

It's impossible to cater to E4X with regular expressions because XML is a recursive structure. I'm not bothered though as E4X isn't exactly a widely used extension. It also doesn't play well with conditional compilation but frankly, conditional compilation shouldn't exist anyway.

Anyway, back to the solution. It takes a pretty conventional approach of removing all strings and regular expressions first and then moving on to the comments. Unfortunately comments are not as simple as \/\*.+?\*\/ - there are nested comments within strings, nested comments within literal-regular-expressions and nested comments within other comments.

Rant: “Crappy comments”

Posted in 'General' by James on June 4th, 2009
Rant: “Crappy comments”

Okay, this really annoys me sometimes; you read a blog post or article, it was interesting, you scroll to the comments section looking for an interesting discussion regarding the content of the post but all you find is countless grammatical or technical corrections contributed by “helpful” readers!

For me, the comments section of a blog post is where users can contribute thoughts concerning the material covered in the post; not spelling mistakes, or a complaint about the site’s content. Apparently not everyone feels this way, because whenever I go to read the comments there’s always a few egotistical children leaving comments about the damn grammar!

Now, I’ve probably done it a couple of times in the past myself but I’ve stopped – I eventually realised I was only doing it to gratify my own needs; I wasn’t really interested in the grammatical correctness of any post – my ego just required a frequent pampering, apparently in the form of grammatical superiority.

Sometimes it may be appropriate to point out these mistakes; if the mistake will cause obvious grief to other readers who take heed from the post then by all means contribute your correction, but if it’s a tiny, barely-noticeable mistake then please keep it to yourself… or, if you feel your alter-ego “Grammar Nazi” kicking in then, by all means, send the owner of the site an angry email.

I have a secret; I delete non-spam comments sometimes! Yes, it’s true! Do you know why? Because this is my website and as such I feel it my responsibility to police the content of it – if a comment is defamatory or just downright rude then I’ll delete it, or if I feel the comment adds absolutely nothing to the post then I’ll delete it. For example, if your comment contains just one word like “Wow!” or “First!” or “Interesting!” then it’ll probably be deleted; why are you wasting your time writing such drivel?

I rarely have a problem though; this is a tiny blog with a tiny readership; something I’ve come to appreciate greatly!

I do feel sorry for the guys at Smashing Magazine! With over 100,000 subscribers they really do get some crap appearing in their comments!

Removing comments in JavaScript

Posted in 'JavaScript' by James on May 24th, 2009
Removing comments in JavaScript

For the recently developed debug.js (view) I had to come up with a way to remove all comments from any piece of JavaScript code.

I originally thought that this would be a piece of cake; a simple regex takes care of everything!

code.replace(/\/\*.+?\*\/|\/\/.*(?=[\n\r])/g, '');

This regular expression would have worked in 90% of situations but, unfortunately I had to build something that would work in every single situation.

It’s worth mentioning exactly when the above regular expression would fail:

  • When comment notation exists in a string, e.g.
  • var str = " /* not a real comment */ ";
  • When comment notation exists in a literal regular expression, e.g.
  • var regex = /\/*.*/;
  • When conditional compilation (supported in IE > 4) exists in the code, e.g.
  • /*@cc_on @*/
    /*@if (@_jscript_version == 4)
    alert("JavaScript version 4");
    @else @*/
    alert("Blah blah blah");
    /*@end @*/

While the likelihood of any of the above happening is low it’s certainly worth catering to all potential situations; just encase one of them arises!

So, after a bit of googling and messing arround, it turns out that the only way of doing this properly is to loop through the code, character by character, checking for certain delimiters and then enabling/disabling modes as the loop progresses: