Trouble removing css comments with RegExp

I want to remove comments from css files using nodejs. I have this working fairly well:


String.prototype.clean = function(){
  return this.replace(/(\
|\\r|\	|\\f|\\v)+/g,'');
};
  S = S.clean(); // Remove line breaks, tabs etc.
  var e1 = new RegExp("/[*][^*/]*[*]/",'g'); // Remove comments.
    S = S.replace(e1,r1);
  var e2 = new RegExp("/[*][^/]*[*]/",'g'); // Remove remaining comments.
    S = S.replace(e2,r1);
  S = S.replace(/ +/g,' '); // Remove redundant whitespace.
  S = S.replace(/ *{ */g,'{'); // Remove spaces around {
  S = S.replace(/ *} */g,'}'); // Remove spaces around }
  S = S.replace(/ *: */g,':'); // Remove spaces around :
  S = S.replace(/ *; */g,';'); // Remove spaces around ;

The e1 above fails to remove this comment:

/*********************************/

Hence I made e2 to get tid of that.

But I still have one comment that I can’t get rid of, it looks like this:

/*background:transparent 0 0 no-repeat url('/grontmij/img/close_button.png');*/

I could get rid of everything with this simpler one:

new RegExp("/[*].*[*]/",'g');

But that becomes a greedy match and removes everything (including code) between the first and the last comment in the code.

I am sure that with the right tweaking, e1 should be able to fix all comments. What I am trying to say to it is:

Match everything except / between / and */

  • but I am obviously failing to do so.

Any ideas?

See attached CSS files.

Hi there,

This should work for you:

/\\/\\*.+?\\*\\//gs

I suppose in JavaScript it would be something like:

new RegExp("/\\/\\*.+?\\*\\//",'gs');

But you might have to tweak that a bit.

Anyway, you can see it in action here: http://regexr.com?32tdv

I hope that helps.

No in JavaScript it would be

/\\/\\*.+?\\*\\//g

JavaScript doesn’t support the ‘s’ and wrapping the expression inside // is the equivalent of wrapping it in new RegExp()

So the complete code to remove the comments in place of the four lines of code not currently working properly would be:

S = S.replace(/\\/\\*.+?\\*\\//g,'');

JavaScript regular expressions treat the content of a variable as if it were one line uness the ‘m’ modifier is specified and so it should work for multiline comments without needing the ‘s’.

Oh ok, thanks for that, felgal.
Reg exs are a pain in the neck at the best of times :slight_smile:
If JavaScript doesn’t support the ‘s’ flag, is there any way of making a dot match a new line?

Edit: Ah ok, you edited your post. Thanks.

Thanks a lot to y’all - I will give it a try and let you know how it works!

Thanks a bundle - works like a charm! I now have this function running:


var cleanCss = function(options){
  var options = options || {};
  var fileSource = options.fileNameSource || __dirname+'/test.source.css';
  var fileTarget = options.fileNameTarget || __dirname+'/test.target.css';
  var encoding = options.encoding || 'utf-8' ;
  var cssSource = fs.readFileSync(fileSource,encoding);
  var callback = options.callback || function(){ log(fileSource,'Done cleaning with cleanCss and saved to',fileTarget); };
  var T = cssSource;
  T = T.replace(/(\
|\\r|\	|\\f|\\v)+/g,''); // Remove line breaks, tabs etc.
  T = T.replace(/\\/\\*.+?\\*\\//g,''); // Remove comments.
  T = T.replace(/ +/g,' '); // Remove redundant whitespace.
  T = T.replace(/ *{ */g,'{'); // Remove spaces around {
  T = T.replace(/ *} */g,'}'); // Remove spaces around }
  T = T.replace(/ *: */g,':'); // Remove spaces around :
  T = T.replace(/ *; */g,';'); // Remove spaces around ;
  fs.writeFile(fileTarget, T, encoding, function(){
    if(typeof callback === 'function'){ callback(); }
  });
};

Hm, why doesn’t this forum have syntax highlighting on javascript …?

Anyways, I don’t understand why your expression works:

/\\/\\*.+?\\*\\/

What it says to me is 'Match any character one or more times between “/" and "/” '.

And what does the “?” mean there?

I used this reference: http://www.w3schools.com/jsref/jsref_obj_regexp.asp
“.” is any character except whitespace.
“+” is previous character one or more times, which is “.”
“?” is zero or one time the previous character, which is - eh “.+”?

How does that not become greedy?

Obviously, the “?” makes the whole difference. If not there, it becomes greedy.

Hi there,

First off, glad it’s working.

AFAIK, (and I’m not an expert by any means) the reg ex works like this:
It matches a forward slash \\/
A star \\*
(both of these have to be masked so that they lose their special meaning within the reg ex).

It then matches any character .
Which should occur one or more times +
The proceeding ? makes it non-greedy (i.e. that match should try and consume as few characters as possible).

Then it matches a star \\* and a final slash \\/
(which again have to be escaped).

To help you get your head around greedy vs non-greedy consider this example:

a = "/*Sitepoint*/"
a.match(/Sitepoint/)
=> "Sitepoint"

a.match(/\\/\\*.+\\*\\//)
"/*Sitepoint*/">

a= "/*Sitepoint*/ is /*great*/"
a.match(/\\/\\*.+\\*\\//)
=> "/*Sitepoint*/ is /*great*/"

a= "/*Sitepoint*/ is /*great*/"
a.match(/\\/\\*.+?\\*\\//)
=> "/*Sitepoint*/"

You see that by putting in the question mark on the final example, we make the reg ex non-greedy.

In the link I posted, you can also hover over the characters in the bar at the top of the page and an explanation will appear as to what they do.

HTH

Oh, BTW, JavaScript highlighting is available if you select “Go advanced” -> “Syntax” -> “JavaScript”

Thanks a lot, think I understand now!
And sorry, I missed the link preivously to the cool page for testing RegExp!

Just trying some js highlighting to see if it works:


if(learning){ alive=true; }

I was gonna ask if you can also do capturing groups - and apparently you can:


T = T.replace(/ *(:|;|{|}) */g,"$1"); // Remove spaces around those characters