Regular Expression - Strip "non-keyboard" characters?

CreedFeed · December 31, 2012, 3:03pm

I am getting data I need to parse and store in a database and within this data are a bunch of non-standard English keyboard characters. What I’d like to do is strip out anything that does not appear on a standard English keyboard, but also keep the copyright symbol, registered symbol, and trademark symbols. Everything else should be stripped away.

An example of what I’m getting is:

TextÂ®

In this case, I’d like to run that through preg_replace and return

Text®

Can anyone help with this?

James_Hibbard · December 31, 2012, 4:04pm

Hi there,

This should work:

<?php
header('Content-Type: text/html; charset=utf-8');
$String ="TexßßtÂ®";
echo preg_replace('/[^a-zA-Z®©]/s', '', $String);

It strips out everything except for a-z, A-Z, ® and ©

CreedFeed · December 31, 2012, 4:10pm

So do I have to just specify the full list of characters I want to include then? I want to keep everything you can type on your standard English keyboard (all of the symbols above the number keys, the brackets, forward/backward slashes, punctuation, new lines, and tabs)?

James_Hibbard · December 31, 2012, 4:16pm

Ah ok.
Yeah, you basically do have to do that.
You can of course define character classes as above to make your life easier.
Also try experimenting with /w which matches any word character.

James_Hibbard · December 31, 2012, 4:17pm

Also, maybe this will help: http://stackoverflow.com/questions/1444666/regular-expression-to-match-all-characters-on-a-u-s-keyboard