How to understand preg_match

I have a preg_match that is used to verify that a form had a valid phone number. I understand some of what I have, but not all. I want to be able to accept the form with the phone number blank (I am asking for several different phone numbers that not everyone will have). The first number I ask for is a fax number.

		if (preg_match ('%^([0-9]( |-)?)?(\\(?[0-9]{3}\\)?|[0-9]{3})( |-)?([0-9]{3}( |-)?[0-9]{4}|[a-zA-Z0-9]{7})$%', stripslashes(trim($_POST['fax'])))) {
		$fax = escape_data($_POST['fax']);
		} else {
		$fax = FALSE;
		echo '<p style="color:red;">Please enter a valid fax number!</p>';
		} // end of check for fax

It starts of with ‘%^ what does this mean/do? It ends with $%’ and again I do not know what it means/does. (could it be similar to <?php and ?> to start and end that section?)
I get that only 0-9 is accepted then I do not know what is happening again until I get to the second [0-9].
I think {3} is requiring three digits, then more I do not know.
Toward the last, it looks like I could accept a seven character phone of SK12345 or even ABCDEFG.

Does ( |-) indicate that either a space of a dash are acceptable at a specific location?

The percent signs are used as delimeters. Commonly it’s the forward slash that’s used for that instead, but some people have different preferences. If you need to refer to the delimeter character within the expression itself, you need to escape it with a backslash, so it’s useful to use a delimeter that you won’t commonly come across in your regular expressions.

I get that only 0-9 is accepted then I do not know what is happening again until I get to the second [0-9].
I think {3} is requiring three digits, then more I do not know.
Toward the last, it looks like I could accept a seven character phone of SK12345 or even ABCDEFG.
Does ( |-) indicate that either a space of a dash are acceptable at a specific location?[/QUOTE]

Here’s a breakdown of the regular expression %^(0-9?)?(\(?[0-9]{3}\)?|[0-9]{3})( |-)?([0-9]{3}( |-)?[0-9]{4}|[a-zA-Z0-9]{7})$%

%…% delimeters
^…$ the match must be from the very start to the very end of the text

About (0-9?)?
(…)? What’s inside is entirely optional
0-9? number followed by an optional space or dash

About (\(?[0-9]{3}\)?|[0-9]{3})( |-)?
(…|…) match either the left part or the right part
\(?..\)? optional parenthesis around the area code
[0-9]{3} match exactly three digits
( |-)? an optional space or dash

The optional parenthesis should not be optional though, because as it currently is, the area code doesn’t need both parenthesis, it could have only one side of the parenthesis and it would still be considered right.
(nnn) nnn nnnn Right
nnn nnn nnnn Right
(nnn nnn nnnn Right when it should be wrong
nnn) nnn nnnn Right when it should be wrong

What I think would work better is to remove the optional nature from the parenthesis and use (…|…) to ensure that you get a match with either both parenthesis, or none.
(\([0-9]{3}\)|[0-9]{3})
(nnn) nnn nnnn Right
nnn nnn nnnn Right
(nnn nnn nnnn Wrong
nnn) nnn nnnn Wrong

The last part of the regular expression in the (…|…) structure is now straight forward.

One of these two must match.
Either [0-9]{3}( |-)?[0-9]{4}
Which is seven digits with an optional gap in the middle

Or, [a-zA-Z0-9]{7}
Which is a 7 character word, which would help to match for example: 555-wilkins

It might be more beneficial to tell us what you will be accepting as valid phone/fax numbers, and if you are going to be cleaning them up before storing them.

What you may permit people to enter:

0123 456 789
0123 456-789
0123456789
(0123) 456-789

In each of the above perhaps this is what you may want to store:

0123456789

Or then again, perhaps you are happy to accept and store all of the above?

Phone numbers are generally quite easy to Filter against expectations and to Sanitize for your own storage/retrieval.

ps When it comes to regex’s you may find it useful to download a [google]desktop regex tool[/google] (however be concious that there are many Regular Expression dialects).

Paul, Thanks for the explanation. Since I want to allow the form to be blank if they do not have a fax number, I should set up something like ((current expression)|{0}) to enforce correct numbers and allow the form to be blank?

Cups, Not sure I care what format they use to enter the phone numbers. I think I have the display set to use (123) 123-1234 but I will need to check that it works for different entry formats, or just store the 10 digits with no formatting (what I have used for testing up to this point). If I decide to remove the formatting, what tool should I be using?

I will look at the desktop regex tool search. I did not even think about the possibility of such tools, but I did see several online tools while I was trying to understand what I currently have.

The more I learn about PHP, the more I realize how little I know, but it sure is great to have this forum to help me understand. :slight_smile:

Perhaps you could add this to your code to check if the field is valid or empty:

if ([COLOR="#FF0000"]empty($_POST['fax']) || [/COLOR]preg_match ('%^([0-9]( |-)?)?(\\(?[0-9]{3}\\)?|[0-9]{3})( |-)?([0-9]{3}( |-)?[0-9]{4}|[a-zA-Z0-9]{7})$%', stripslashes(trim($_POST['fax'])))) { ...

Normally I do this the other way around, but checking if the field is not empty or not valid, in which case an error is thrown. So not sure if the obove will work so well, but worth a try. (Warning: PHP noob!)

I don’t think that that would be as easy to understand as something like this instead:


if (preg_match(...)) {
    // handle fax number
    ...
} else if (empty($_POST['fax'])) {
    // allowed to be empty, optional, not required
} else {
    // fail
    ...
}

While it’s possible to do many things with regular expressions, there are times when doing so makes things more complex, which is a state to be avoided if at all possible.

Paul & Ralph, Thanks. This is another example of me trying to find the hard way to do something. :slight_smile:

Both of you have shown me a way that is easy for me to understand and even easier to implement.

Thanks. I like to keep the code as easy and simple as practical. Why?

While programming is one of the most complex things we can do, debugging that code is an order of magnitude even more complex than that. By pushing our coding ability to such complex areas, our ability to successfully debug that code becomes impossible.
So, by keeping the code simple and easy to understand, we help to ensure that it’s a lot easier to keep it problem free.

That my bad paraphrasing of a programmer Douglas Crockford, but I find that it’s a smart idea go stand by.

If you did only want to capture the numbers from a string:


$input = '(0123) 123-1234';
$output = preg_replace('#[^0-9]#', '', $input);
echo($output);
// gives
// 01231231234

Then check if there are at least 9 remaining numbers


if ( strlen($output) > 9 ){
// well, it seems to fit my pattern ...


}

I like to keep the code as easy and simple as practical.

Paul, That makes sense. Hopefully, I can learn to keep my programming simple.

Cups, WOW! That would reduce the code to something much easier than what I have been using.

What I actually need is to have 10 digits. We don’t have anyone in the group that is outside of a three state area here in the U.S.

Let me think and I will post something else.

What about this? I am not at my development computer, so I have not test it.


if ( strlen($output) = 10){
// well, it seems to fit my pattern ...


}  

Keeping it simple is going to be a problem for me. I first thought of using >9 and <11 for testing the length. Maybe this OLD dog can learn a new trick or two.

A single = assigns a value to a variable - so what you posted will not work (though it might seem to) use double == or even better, get into the habit of triple === to when making a comparison - which is what you are doing with this code.


if ( strlen($output) === 10){ 
// well, it seems to fit my pattern ... 


} 

If you are absolutely sure all numbers MUST be 10 chars (including cell phones?) then reject anything shorter/longer – just try and clearly hint that on your GUI.

CAVEAT:

Glad you find my code simpler, but my example is not doing the same job as the code you posted.

Originally you were checking to see if what the user entered fit one of a number of patterns (numbers, spaces and so on), whereas what I posted was partly in response to your reply about you only wanting to store the numbers.

Hence I am kind of simplifying things by a) wiping out anything NOT a number, then b) just counting the length of what is left over.

Cups,

I will give your code a run and see what happens. I expect it will be tomorrow before I get a chance to work on the code. I have an office party to attend tonight.

Cups,

I think I have the code running, but now I remember something you said about not writing the same code many times.

This check for a valid phone number is done several times, including home phone, office phone, cell phone, fax phone, etc.
Is there a way to collect the various posted phone numbers and just have the code repeat until all the phone numbers are processed?
Would I need to change the form names to all have a preface of something like ph so that my fax number would be ph_fax and the office number would be ph_office?

Good question.

You could rely on a prefix, but that would imply that you carry that prefix and name from your html form, through your validation and on into your database table.

HTML
<input type=text id=ph_office name=ph_office />

DBASE


contacts
=======
id | 23
name | Ramone De'Ath
ph_office 1231231234

if that was the case you could indeed check for ph_* and process the number, or simply set an array of elements to be checked as phone numbers as in the example below:



// spoof the behaviour of your form being submitted

$_POST['tel'] = '(0123) 123-1234';
$_POST['mobile'] = '0123-123-1234';
$_POST['none_existent'] = '(0123) 123-1234';
$_POST['fax'] = '';

// a simple function returns numbers only, include this in a file as necessary

function extractNumbers($input){
$output = preg_replace('#[^0-9]#', '', $input);
return $output;
}

// set the elements which should contain numbers only

$tels = array('fax', 'tel', 'mobile');

// reassign the numbers back to your POST array in this case
// though you could create new variables ready for input to 
// your database if you wanted

foreach( $tels as $tel){

  if( array_key_exists($tel, $_POST )){
    $_POST[$tel] = extractNumber($_POST[$tel]);
  }

}

var_dump($_POST);

// gives

  'tel' => string '01231231234' (length=11) // correct
  'mobile' => string '01231231234' (length=11) // correct
  'none_existent' => string '(0123) 123-1234' (length=15) // unexpected, so ignored
  'fax' => string '' (length=0)  // should not really appear - but as I forced it into the spoof POST array - it does


Using the ph_* in some kind of magic way would be a good way to have JS do some extra checking on the client, but otherwise could be something important which you overlook in subsequent applications – better to set the $tels array implicitly IMO (at this stage).

I’m firing away answering your questions on this but I should also say that as a user, being fed back my phone number at some point, I would find it far easier to read 0123 123 1234 than 01231231234 – just to throw the cat among the pigeons… :wink:

Which would imply a few changes:



// rename the function 
function extractTelNumber($input){

// replace the dashes with spaces
$input = str_replace("-", " ", $input);

// permit spaces to be part of the phone number 
$output = preg_replace('#[^0-9 ]#', '', $input);

return $output;
}


Cups,

Thank you! I will get to work on this very soon. I currently have two functions that I have tacked on the end of the initialization file with my db access. Now I will have four functions. Would it make more sense to have a functions file with all the site functions in one place?

Cups,

Thanks for the help with the phone numbers. I now have stored in the database, just the numbers. I have every phone number displayed formatted on the web pages. I will let the user input in any method they prefer (I prefer to enter the 10 digits from the number pad on my keyboard). If there is a problem with an empty required field, I echo the formatted numbers into the form.

Now I have to clean up my code (remove all the echo statements I used for troubleshooting) and find my INSERT INTO problem. :slight_smile: