Trouble with preg_replace regex

StevenHu · August 18, 2014, 7:51pm

The following preg_replace lines work as they are supposed to work, but I can’t wrap my mind around WHY they work. Specifically, I don’t understand the use of $2 and $1. For instance, WHY does the first $text example return ‘ba’ and not ‘nana’? Why does $2$1 give the result the way it does?


	<?php
	$text = 'banana';
	$text = preg_replace('/(.*)(nana)/', '$1', $text);
	echo "$1 = " . $text; // $1 = ba
	
	$text2 = 'banana';
	$text2 = preg_replace('/(.*)(nana)/', '$2', $text2);
	echo "<br>$2 = " . $text2; // $2 = nana
	
	$text3 = 'banana';
	$text3 = preg_replace('/(.*)(nana)/', '$2$1', $text3);
	echo "<br>$2$1 = " . $text3; // $2$1 = nanaba
	
	$text4 = 'banana';
	$text4 = preg_replace('/(.*)(nana)/', '$1$2', $text4);
	echo "<br>$1$2 = " . $text4; // $1$2 = banana	
	?>

chris_upjohn · August 19, 2014, 4:27am

Hi Steven,

The $1 and $2 are special capture identifiers used to reference the capture groups in the expression, the number can be 1-whatever as long as a capture group index exists meaning if we said $3 a 3rd set of parenthesis would need to exist in the expression otherwise you would get an empty string and an expected result if you weren’t expecting that result.

Lets break it down.

COLOR=“#FF0000”[/COLOR]COLOR=“#FF8C00”[/COLOR]
$1 = Capture group 1
$2 = Capture group 2

As you can see because we have paired the expression into two separate capture groups we now have exclusive access to the values they match against, hopefully that explains it in a nutshell as expressions can’t get far more advanced and much more daunting than the above, in my experience anyway.

StevenHu · August 19, 2014, 3:42pm

Thanks for answering.

I don’t understand how .* references just “ba”. A dot references just one letter. How does .* reference two letters?

cpradio · August 19, 2014, 3:56pm

means 0 or more times. So a . matches any character and the * tells it to match multiple times.

cpradio · August 19, 2014, 3:57pm

This is the best book I’ve EVER purchased!
http://www.amazon.com/Regular-Expression-Pocket-Reference-Expressions/dp/0596514271#

I use it a LOT and it covers Regular Expressions in several programming languages.

StevenHu · August 19, 2014, 6:45pm

(.*)(nana)

So the two () are dividing up the whole word in advance. The second is referencing nana and the first is referencing all the rest of the letters before it. Is that what’s happening?

cpradio · August 19, 2014, 6:47pm

Yes

StevenHu · August 19, 2014, 8:26pm

Now it’s clear! Thanks.