Reg Exp Help - All Upper/Lower case names

ahundiak · November 9, 2011, 12:19pm

So I have a little registration system and most people take the time to case their names in a “normal” fashion i.e. “Bill Smith”. But some like to shout “BILL SMITH” and some are a bit shy “bill smith”.

Looking for an expression that will match if all letters are upper case and another expression for all lower case. Needs to ignore other characters.

I could then make a reasonable attempt at adjusting the names.

Thanks in advance.

Stormrider · November 9, 2011, 12:22pm

/^([A-Z]+\s[A-Z]+|[a-z]+\s[a-z]+)$/

Something like that?

system · November 9, 2011, 12:34pm

You could convert names to lower case and then use ucwords() to capitalise the first letter.

ahundiak · November 9, 2011, 12:58pm

I could but then McReynolds get messed up.

StarLion · November 9, 2011, 1:54pm

or just… if($str == strtoupper($str) || $str == strtolower($str)) and dont make it so hard on yourself?

ahundiak · November 9, 2011, 1:55pm

That seems to work. Thanks.

ahundiak · November 9, 2011, 1:57pm

Yep but I get points for using regular expressions. Not really but my knowledge of how to create expressions is abysmal. Slowly building a working library.

StarLion · November 9, 2011, 1:58pm

Tell your teacher to stop rewarding making things more difficult.

ahundiak · November 9, 2011, 2:07pm

Now that I have my expression perhaps we could morph this thread into a discussion of expressions vs functions?

I like to use expressions when appropriate because:

Compact code - Not always a good thing but as long as the expression is documented with some tests then it’s okay.
Easy to reuse - Just a string. Can always use a DEFINE to share. As opposed to making a custom function when then needs to managed.
Speed - Pretty low on my list but I do tend to process thousands of names in one request.
Professional - looks like I know what I am doing.
Expressions are more or less standard and can be used in multiple programing languages.
Expressions are also quite common so learning to read them is perhaps a good thing.
Expressions are useful for validation and can avoid the need for custom validation functions.
Teachers might give extra credit?

StarLion · November 9, 2011, 3:44pm

Go go SPF for eating my post and not posting it. Anyway, as usual this is the time for me to be contrarian, so:

#1: Regex’s get very complicated very quickly. What you’re doing here is an 0.5 on the difficulty scale of 10.
#2: Predefined functions = no management either? (strtoupper and strtolower are PHP Core Functions.)
#3: Fairly certain regex is actually slower on a 1-to-1 vs strtoupper, but even then the differences will be so microscopic…
#4: You came here to ask, so… no you dont? And honestly, using a regex in this capacity to me just makes it look like you dont know the language.
#5: When was the last time you coded a single webpage in multiple languages? (EDIT: That WASNT a class assignment)
#6: Learning is always good. Learning the right place to use the knowledge, better.
#7: Try regex’ing an email address to the RFC2822 standard real quick.
#8: Not a reason.

ahundiak · November 9, 2011, 5:11pm

#1: Regex’s get very complicated very quickly. What you’re doing here is an 0.5 on the difficulty scale of 10.

Not mine! I know what you mean but .5 is pretty much the limit for me.

#2: Predefined functions = no management either? (strtoupper and strtolower are PHP Core Functions.)

Pretty sure if($str == strtoupper($str) || $str == strtolower($str)) is not a core function.

#3: Fairly certain regex is actually slower on a 1-to-1 vs strtoupper, but even then the differences will be so microscopic…

And if the goal was to upper case a string then sure. But it’s not so not sure of the relevance.

#4: You came here to ask, so… no you dont? And honestly, using a regex in this capacity to me just makes it look like you dont know the language.

I asked abut an expression and was happy with the answer. I personally think preg_match($exp,$name) is better than two case shifts and two comparison operators but to each their own.

#5: When was the last time you coded a single webpage in multiple languages? (EDIT: That WASNT a class assignment)

Can’t really remember the last time I didn’t use multiple languages for each webpage. PHP/SQL/JavaScript/Annotations. All support regular expressions in a more or less consistent fashion. Not sure of the relevance though. I use a number of languages during a typical work week.

#6: Learning is always good. Learning the right place to use the knowledge, better.

Yep. That is why debating can be useful.

#7: Try regex’ing an email address to the RFC2822 standard real quick.

Why?
/**
* @Assert\NotBlank()
* @Assert\Email()
*/
public function getEmail() { return $this->person->getEmail(); }
Works fine for me.

#8: Not a reason.

It is if your instructor thinks it is.

StarLion · November 9, 2011, 7:23pm

It’s completely comprised of core elements though, none of which require maintaining by a user. IF, ==, and strtoupper/strtolower. If they ever change one of those elements, i’d probably stop using PHP

And if the goal was to upper case a string then sure. But it’s not so not sure of the relevance.
In fact you’re right. Executing all of those horribly long and complicated statements saved you… 0.00145 seconds (rounded) of execution time across 5000 records, according to my tests. (PS: If you’re regexing more than that at a single time, you’re probably doing something wrong.)

I asked abut an expression and was happy with the answer. I personally think preg_match($exp,$name) is better than two case shifts and two comparison operators but to each their own.

I think that my statement is simpler to read than a -lot- of regex out there. Again, when you get beyond the 0.5 difficulty, this will become more clear to you. Just a question of application.

Can’t really remember the last time I didn’t use multiple languages for each webpage. PHP/SQL/JavaScript/Annotations. All support regular expressions in a more or less consistent fashion. Not sure of the relevance though. I use a number of languages during a typical work week.
At most, you should be validating twice - once at the Javascript level (which is purely for show, and is subject to bypass, so should never be trusted), and once at the PHP level (which is actually trustworthy, if you do the validation right). So… no, I dont agree with your assessment.

Why?
/**
* @Assert\NotBlank()
* @Assert\Email()
*/
public function getEmail() { return $this->person->getEmail(); }
Works fine for me.

Which… is a custom function. So… I dont get your point here, except that you’re proving my statement true.

(The general form of a 2822 address is (?:[a-z0-9!#$%&'+/=?^_{|}~-]+(?:\\.[a-z0-9!#$%&'*+/=?^_{|}~-]+)|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])")@(?:(?:a-z0-9?\.)+a-z0-9?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-][a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\]) ). Try reading that lol.

TheOriginalH · November 9, 2011, 7:29pm

Lots of sad faces in a 2822 :devil:

StarLion · November 9, 2011, 7:41pm

It reflects all the points at which you lose pieces of your soul trying to regex something like that.

ahundiak · November 9, 2011, 9:47pm

I’m probably doing things wrong but when I have a bit of functionality that needs to be reused I tend to make a function even if all the functions inside are core elements.

In fact you’re right. Executing all of those horribly long and complicated statements saved you… 0.00145 seconds (rounded) of execution time across 5000 records, according to my tests. (PS: If you’re regexing more than that at a single time, you’re probably doing something wrong.)

You seem fixated on execution time. You really ran a bench mark?
I am surprised that you consider “preg_match’/^([A-Z]+\s[A-Z]+|[a-z]+\s[a-z]+)$/',$name)” to be long and complicated. Maybe I’m just smarter than most developers but to me it seems pretty basic.

I think that my statement is simpler to read than a -lot- of regex out there. Again, when you get beyond the 0.5 difficulty, this will become more clear to you. Just a question of application.

You seem to be saying that “if($str == strtoupper($str) || $str == strtolower($str))” can be used in place of all complicated expressions? I’m not so sure about that. I would think that more complicated expressions would probably require more complicated functions. In which case I don’t really understand your point. Are you saying that if I use simple expressions then I will also be forced to use more complicated ones? Is this something the interpreter enforces?

At most, you should be validating twice - once at the Javascript level (which is purely for show, and is subject to bypass, so should never be trusted), and once at the PHP level (which is actually trustworthy, if you do the validation right). So… no, I dont agree with your assessment.

I think I see part of my problem. I use expressions for things other than validation. Is that wrong? Expressions shall only be used for validation? If so then I fear I need to redo quite a bit of my sql code.

Which… is a custom function. So… I dont get your point here, except that you’re proving my statement true.

For me at least it makes sense to use simple expressions for simple tasks. Truly validating an email is not a simple task. Therefore, I don’t use expressions for email. Which is why your demand that I create one is very puzzling. I’m sure that my methodology will change once I gather sufficient experience but for the moment anyways, I try to use the best tool for the job at hand.

system · November 9, 2011, 11:02pm

ucwords capitalises only the first letter. Obviously you would have to add more code for “special” cases.

What is normally done is store names in a database all in either upper or lower case (I normally store all in lower case) and then after extracting names from the database, format the names however you like in the application (not the database) for output to whatever. It’s fairly straight forward and not rocket science.

ahundiak · November 9, 2011, 11:56pm

Interesting. I have never come across a modern application that always converts names to upper or lower. For my stuff at least, the all upper or all lower names only happens in a tiny number of cases. Probably people with tablets or smart phones that don’t like the shift key. And it’s just an annoyance more than anything. There are a lot more names like McReynolds than there are upper/lower only names. Lot of special code would be needed.

system · November 10, 2011, 12:17am

How you store names is up to you.

You can:

store names as entered by the user
store them in a consistent format like all upper or all lower case.

Personally, I store names in all lower case and then reformat them to what I need after extracting them from the database and before outputting to wherever.

If you let users enter names however they like and not reformat it in any way anywhere at all then you are likely to get at least a small number of outputs looking like BilL jONeS.

I have my own customised php class for reformatting names.

salathe · November 10, 2011, 1:06pm

How about names like PRINCE (one name), BILLY JOE JNR (more than two names), PETER O'PETERSON (non-A-Z)?

ahundiak · November 10, 2011, 11:23pm

Not exactly sure what your question is. The expression flags all of the above. When I do need to clean up a name I explode on space and then do a ucfirst on each one.