How to escape replacement string for preg_replace?

Suppose I have this code:


$pattern = '/some pattern/';
$replacement = $_POST['name'];

$str = preg_replace($pattern, $replacement, $source);

$replacement has data from user input and can contain virtually anything and I just want the pattern to be replaced with literal data contained in $replacement. However, preg_replace will not work well for this purpose if $replacement contains some special characters like ‘\’. Also, ‘$’ followed by a digit has a special meaning so it may not work either. How do I escape $replacement so that the replacement process is bullet proof, ie. works with any data I throw at it? So far I have found that:

  1. preg_quote is no good because it is designed to be used for patterns only
  2. when I escape each ‘\’ with another ‘\’ the function seems to work as expected

But I wasn’t able to find any documented method for proper escaping. I still don’t know what to do with dollar signs, should they all be escaped or only those followed by numbers? Should all backslashes be escaped? Are there any other characters that should be escaped?

Backslash should work for all metacharacters.

All backslashes that are not regex metacharacters should be escaped, which is to say all except the two used to “quote” the expression. The material about preg_x functions in the PHP Manual should contain a list of all metacharacters that need to be escaped.

First, replace the pattern with a placeholder of your choosing, like “##NAME##”.

Then use str_replace to replace ##NAME## with $_POST[‘name’].

Now you don’t have to worry about what’s in $_POST[‘name’] as you’re not using a regular expression function on it at all.

I have an impression you are confusing the pattern with the replacement string. The replacement string does not have the two characters to quote the expression - this applies to the pattern.

The material about preg_x functions in the PHP Manual should contain a list of all metacharacters that need to be escaped.

There is a list for the pattern, I was not able to find any list for the replacement string.

Dan, that is a nice workaround but I’m wondering if it’s possible to make preg_replace behave as expected. I’d prefer not to use workarounds if there is a more elegant solution.

Or, er, [fphp]preg_quote/fphp ? :stuck_out_tongue:

Edit:

  1. preg_quote is no good because it is designed to be used for patterns only
    Whoops! Missed that!

preg_replace_callback() lets you control the replacement, so there won’t be any special interpretation of it. You could also preg_split() and then just implode() with $replacement as the delim.

As crmalibu said, when using a callback you choose what the replacement string is:


function my_callback($match) {
    return $_POST['name'];
}
$pattern = '/some pattern/';
$str = preg_replace($pattern, 'my_callback', $source); 

That’s what preg_replace does.

I think the issue is if the user supplied string contains a backreference like $1 or \1 etc…

Ah … Silly me.

Salathe,

There is a bug in your code, should be:


...
$str = preg_replace_callback($pattern, 'my_callback', $source);

Yes, the problem occurs when the replacement string contains backreferences, so anything like $1, $2, $11, \\1 or any multiple occurrences of backslashes. There were some accurate workarounds given in this thread so thanks to all for resonding. To sum up these are the solutions that I find working:

  1. using preg_replace_callback indeed turns off parsing of backreferences and can be used safely
  2. using preg_split and then implode with $replacement
  3. escaping every backslash and dollar sign with a backslash seems to work fine in my tests:

$str = preg_replace($pattern,
    strtr($replacement, array('\\\\' => '\\\\\\\\', '$' => '\\$')),
    $source);

I’m using preg_replace in my database class for emulating prepared statements that’s why I need it to work with any string. For now I will go with solution 3 because in my case $pattern and $replacement are arrays to handle multiple placeholders substitutions and with the other solutions I would probably need to run preg_replace multiple times on the same string, which would be a bit worse performance-wise.