[ot] > With the problem of the OP however he wants to match as little as possible before the subject string, as far as I know there is nothing you can do to make that happen.
It could be done, assuming I’m understanding what you’re looking for correctly. However, in this case, using \\d
is the right and proper thing to be doing.[/ot]
biglittle, it looks like your confusion arises from not quite understanding how PCRE (the regex library used for the preg_*
functions) chooses what to return.
Put simply, it returns the first valid match (of course, if there is one). The subject string is searched from left to right, character by character, when looking for a match.
Given your regex, upon reaching the very first [COLOR="#006400"]page=[/COLOR]
and matching it against the regex’s [COLOR="#B22222"]page=[/COLOR]
, things are looking good. The next part is then executed, the [COLOR="#B22222"](.*?)[/COLOR]
, which happily eats up everything that it can with an eye to still getting a successful match of the whole regex. Since you only ask that what comes after the [COLOR="#B22222"](.*?)[/COLOR]
be the literal [COLOR="#006400"]&searchId=2">Last</a>[/COLOR]
, then it eats up everything to that point.
As an aside, a greedy version like [COLOR="#B22222"](.*)[/COLOR]
would continue looking through the whole subject string after noticing that [COLOR="#006400"]&searchId=2">Last</a>[/COLOR]
had been seen. It’s greedy and wants to eat as much as possible. In your case, since [COLOR="#006400"]&searchId=2">Last</a>[/COLOR]
does not occur later in the string, both greedy and non-greedy would eat the same amount. The only difference is how much of the string is examined after finding that part of the string.
So, after [COLOR="#B22222"](.*?)[/COLOR]
noms everything that it can, the rest of the regex goes on to try and get matched. The [COLOR="#006400"]&searchId=2">Last</a>[/COLOR]
is there at this stage so the regex has found its first match. At this point, nothing else is done. The match is returned and processing of the subject string stops immediately. A different regex engine, POSIX, would continue on in the string to try and find any more matches and would return the longest (leftmost) match possible (POSIX doesn’t have the concept of greedy/non-greedy): in your case, there isn’t a longer match from the initial [COLOR="#006400"]page=[/COLOR]
starting point. However, PCRE gives up at the very first match that it can find.
Hopefully that hasn’t confused you entirely. In short, PCRE finds the first matching part of the subject string possible.
A final point, since you were using preg_match_all()
, after finding the first match then the subject string is examined again starting at the ending point of the previous match (i.e, between [COLOR="#006400"]>[/COLOR]
and [COLOR="#006400"]][/COLOR]
near the end of the string). From this point, the rest of the string (only [COLOR="#006400"]>]</span>[/COLOR]
) does not match so only the one match is pushed into the array.