Please help my regex for preg_match_all

eve123 · February 11, 2010, 10:45pm

Hello, I’m writing a regex to use in preg_match_all, it’s been successful for preg_match, but I can’t get it right for preg_match_all. The pattern: [T-xxx-yyy] with xxx as numbers (unlimited length) and yyy as alphabet (unlimited length). For example: [T-123-OK] or [T-8291-CANCEL] will match the pattern.

Here’s my code.


$regex = "/(.*)([T-[0-9]+-[A-Za-z]+)(.*)/";
$string = "[T-1223-DONE][T-381-CANCEL][T-547-DELETE]";
echo "<pre>";
$matches = array();
preg_match_all($regex, $string, $matches);
print_r($matches);
echo "</pre>";

It returns this array.


    [1] => Array
        (
            [0] => [T-1223-DONE][T-381-CANCEL]
        )

    [2] => Array
        (
            [0] => [T-547-DELETE]
        )

    [3] => Array
        (
            [0] => 
        )

)

What I want is something like this:


Array
(
    [0] => Array
        (
            [0] => [T-1223-DONE]
            [1] => [T-381-CANCEL]
            [2] => [T-547-DELETE]
        )
)

Also, how do I make my regex able to handle multi line?
Please help I’m very newbie at this regex thing-- thanks in advance

Jake_Arkinstall · February 11, 2010, 11:10pm

The problem was mainly that your regex was very ‘greedy’, so removing unnecessary wildcards and using the ‘+?’ combo that was fixed:


$regex = "/\\[T-(\\d+?)-([A-Za-z]+)\\]/"; 
$string = "[T-1223-DONE][T-381-CANCEL][T-547-DELETE]"; 
echo "<pre>"; 
$matches = array(); 
preg_match_all($regex, $string, $matches); 
print_r($matches); 
echo "</pre>";

As for multiline, just use the m flag:


$regex = "/\\[T-(\\d+?)-([A-Za-z]+)\\]/m"; 
$string = "[T-1223-DONE][T-381-CANCEL][T-547-DELETE]"; 
echo "<pre>"; 
$matches = array(); 
preg_match_all($regex, $string, $matches); 
print_r($matches); 
echo "</pre>";

hash · February 11, 2010, 11:17pm

I don’t think non greedy is the issue, rather the .* at the beginning and end, and not escaping [

$regex = "~\\[T-\\d+-[A-Z]+\\]~";

eve123 · February 12, 2010, 2:21am

Um sorry I made a mistake for the original regex, I just retype it in a hurry instead of copying from my php file, hence the unescaped [. What I actually used:


$regex = "/(.*)(\\[T-[0-9]+-[A-Za-z]+\\])(.*)/";

Anyway, I tried Jake’s code and it works Although I’m curious of the

(\\d+?)

Why do we need the “?”

hash, I tried your code on a multiline and it only return the last match of each line.

Thanks Jake & hash!

hash · February 12, 2010, 3:10am

Hmm, works for me


$regex = "~\\[T-\\d+-[A-Z]+\\]~";
$string = "asdasd[T-111-DONE]ewrew[T-222-CANCEL]asdasd[T-333-DELETE]asdasd
asdasd[T-444-DONE]ewrew[T-555-CANCEL]asdasd[T-666-DELETE]asdasd
asdfasdfasd[T-777-NEWLINE]ewr";
echo "<pre>";
$matches = array();
preg_match_all($regex, $string, $matches);
print_r($matches);
echo "</pre>"; 
/*
Array
(
    [0] => Array
        (
            [0] => [T-111-DONE]
            [1] => [T-222-CANCEL]
            [2] => [T-333-DELETE]
            [3] => [T-444-DONE]
            [4] => [T-555-CANCEL]
            [5] => [T-666-DELETE]
            [6] => [T-777-NEWLINE]
        )
)
*/

The ? makes it non greedy, eg


$str = '<p>para 1</p><p>para2</p>';
echo preg_replace('~<p>.+</p>~', ' -para- ', $str); // -para- even though there are 2
echo '<br>';
echo preg_replace('~<p>.+?</p>~', ' -para- ', $str); // -para- -para-

eve123 · February 12, 2010, 5:58am

Hmm I’m not sure why, but I tested it with this code. Anyway, thanks for your help! Really great community, I asked this question in other board yesterday and I haven’t got any replies yet!


$regex = "~\\[T-\\d+-[A-Z]+\\]~"; 
$matches = array();
$text13 = "MY TASK [T-122-done] is DONE, 
MY TASK [T-134-DONE] is DONE, 
MY TASK [T-253-Done] is DONE,
MY TASK [T-321-Done] is DONE, 
MY TASK [T-654-DONE] is DONE";
preg_match_all($regex, $text13,$matches,PREG_PATTERN_ORDER);
echo "<pre>";
print_r($matches);
echo "</pre>";
/*
Array
(
    [0] => Array
        (
            [0] => [T-134-DONE]
            [1] => [T-654-DONE]
        )

)
*/

salathe · February 12, 2010, 9:44am

With the code in your last post, the “yyy” part is only matching uppercase letters but some include lowercase letters (“Done”). Either change the character class (“[…]”) to allow lowercase letters or make the entire regular express case-insensitive by using the “i” modifier (“…~i”)