Trouble with preg_replace regex

The following preg_replace lines work as they are supposed to work, but I can’t wrap my mind around WHY they work. Specifically, I don’t understand the use of $2 and $1. For instance, WHY does the first $text example return ‘ba’ and not ‘nana’? Why does $2$1 give the result the way it does?


	<?php
	$text = 'banana';
	$text = preg_replace('/(.*)(nana)/', '$1', $text);
	echo "$1 = " . $text; // $1 = ba
	
	$text2 = 'banana';
	$text2 = preg_replace('/(.*)(nana)/', '$2', $text2);
	echo "<br>$2 = " . $text2; // $2 = nana
	
	$text3 = 'banana';
	$text3 = preg_replace('/(.*)(nana)/', '$2$1', $text3);
	echo "<br>$2$1 = " . $text3; // $2$1 = nanaba
	
	$text4 = 'banana';
	$text4 = preg_replace('/(.*)(nana)/', '$1$2', $text4);
	echo "<br>$1$2 = " . $text4; // $1$2 = banana	
	?>

Hi Steven,

The $1 and $2 are special capture identifiers used to reference the capture groups in the expression, the number can be 1-whatever as long as a capture group index exists meaning if we said $3 a 3rd set of parenthesis would need to exist in the expression otherwise you would get an empty string and an expected result if you weren’t expecting that result.

Lets break it down.

COLOR=“#FF0000”[/COLOR]COLOR=“#FF8C00”[/COLOR]
$1 = Capture group 1
$2 = Capture group 2

As you can see because we have paired the expression into two separate capture groups we now have exclusive access to the values they match against, hopefully that explains it in a nutshell as expressions can’t get far more advanced and much more daunting than the above, in my experience anyway.

Thanks for answering.

I don’t understand how .* references just “ba”. A dot references just one letter. How does .* reference two letters?

  • means 0 or more times. So a . matches any character and the * tells it to match multiple times.

This is the best book I’ve EVER purchased!
http://www.amazon.com/Regular-Expression-Pocket-Reference-Expressions/dp/0596514271#

I use it a LOT and it covers Regular Expressions in several programming languages.

(.*)(nana)

So the two () are dividing up the whole word in advance. The second is referencing nana and the first is referencing all the rest of the letters before it. Is that what’s happening?

Yes

Now it’s clear! Thanks.