For the recently developed
I originally thought that this would be a piece of cake; a simple regex takes care of everything!
This regular expression would have worked in 90% of situations but, unfortunately I had to build something that would work in every single situation.
It’s worth mentioning exactly when the above regular expression would fail:
- When comment notation exists in a string, e.g.
- When comment notation exists in a literal regular expression, e.g.
- When conditional compilation (supported in IE > 4) exists in the code, e.g.
var str = " /* not a real comment */ ";
var regex = /\/*.*/;
While the likelihood of any of the above happening is low it’s certainly worth catering to all potential situations; just encase one of them arises!
So, after a bit of googling and messing arround, it turns out that the only way of doing this properly is to loop through the code, character by character, checking for certain delimiters and then enabling/disabling modes as the loop progresses:
The best way to wrap your head round the above code is to literally take it step by step. There are six modes; only one mode will be set to
true at any time during iteration; this activated mode respresents what construct is currently being looped through (a string, a regular expression, a comment etc.). The modes include:
mode.singleQuote: Single-quote delimited string (
mode.doubleQuote: Double-quote delimited string (
mode.regex: Literal regular expression (
mode.blockComment: Block comment (
mode.lineComment: Line comment (
mode.condComp: Conditional compilation (
Here’s an example trail through the loop:
Using string -> "a\"" /*Boo!*/ 01. Double quote; *mode.doubleQuote* activated. 02. Letter 'a'; loop continues. 03. Character '\'; loop continues. 04. Double quote; ignored because the previous character is an escaper. 05. Double quote; last character is not '\'; so *mode.doubleQuote* de-activated 06. Space; loop continues. 07. Character '/'; Next character is asterisk; *mode.blockComment* activated - character replaced with an empty string 08. Letter 'B'; loop continues. - character replaced with an empty string 09. Letter 'o'; loop continues. - character replaced with an empty string 10. Letter 'o'; loop continues. - character replaced with an empty string 11. Character '!'; loop continues. - character replaced with an empty string 12. Character '*' followed by '/'; *mode.blockComment* de-activated - both characters replaced with an empty string Result -> "a\""
There’s quite a lot of forward/back-tracking involved, that’s why a couple of arbitrary characters are added to either end of the string before the loop; to make sure something is there when
str[i-2] is queried.
Note: the code I used in the
removeComments function could be shortened; in fact, the entire function could probably be squeezed into 20 lines but that would only slow it down. Terseness does not always equal speed, especially so in this situation; a somewhat repetitive stream of IF statements really is the only way to produce acceptable performance.
I’d love to be proven wrong in this situation so if anyone can come up with an easier way of doing this I’d love to hear it! Especially if you think you can solve this with regular expressions alone!