creedfeed — 2012-12-31T10:03:49-05:00 — #1
I am getting data I need to parse and store in a database and within this data are a bunch of non-standard English keyboard characters. What I'd like to do is strip out anything that does not appear on a standard English keyboard, but also keep the copyright symbol, registered symbol, and trademark symbols. Everything else should be stripped away.
An example of what I'm getting is:
In this case, I'd like to run that through preg_replace and return
Can anyone help with this?
pullo — 2012-12-31T11:04:38-05:00 — #2
This should work:
header('Content-Type: text/html; charset=utf-8');
echo preg_replace('/[^a-zA-Z®©]/s', '', $String);
It strips out everything except for a-z, A-Z, ® and ©
creedfeed — 2012-12-31T11:10:12-05:00 — #3
So do I have to just specify the full list of characters I want to include then? I want to keep everything you can type on your standard English keyboard (all of the symbols above the number keys, the brackets, forward/backward slashes, punctuation, new lines, and tabs)?
pullo — 2012-12-31T11:16:08-05:00 — #4
Yeah, you basically do have to do that.
You can of course define character classes as above to make your life easier.
Also try experimenting with /w which matches any word character.
pullo — 2012-12-31T11:17:36-05:00 — #5