How to sanitize a title (strip illegal characters entirely)?

I was using the below function to clean post titles to create a GUID link and unique name, but while I thought it was working iit doesn’t seem to actually be doing anything useful, for instance this:

“l’sm,-$~~*&dfs$%*£!”!/"

Is meant to be the sanitized title from “l’sm, $~~*&DFS$%*£!”!/".

I don’t quite understand the preg_replace function or how they derive the patterns, how would I go about eliminating these characters as I’m guessing that using a name like that in a URL will not end well.

	//SANITIZE A TITLE FOR USE
function cleanTitle($title, $optional='', $type='') {
	$title = strip_tags(trim($title));
	$title = preg_replace('/&.+?;/', '', $title);
	
	if ( '' === $title || false === $title )
		$title = $optional;
		
	if ($type === 'cleanname') {
		$title = strtolower($title);
		$title = preg_replace('/\\s+/', '-', $title);
	}
	return $title;
}

Not quite sure what you are after, but if you know what you want to allow in (a white-list) then you can remove everything which is not on that list:


// rm everything but numbers and letters and chars . , /
// upper and lowercase
$input = '0123 Big Street bc < < ?.,/#';
$output = preg_replace('#[^0-9a-z .,/]#i', '', $input);

Largely I want to remove anything that would not be acceptable in a URL as this particular function should be creating a unique text name for each post that could be used in a link such as www.sitepoit.com/post.php?title=hello-i-am-a-title

In which case I believe your function would be fine if it rem0ves all but numbers and letter characters. Though looking at the URL here it leaves in () also. Is there a good learning resource for preg_replace besides the PHP Manual? It doesn’t really explain how to actually write the type of filter you want. I look at your code and I can understand what it is doing up to ‘z’ and obvious the ‘’ as the replacement character but do not follow what the ‘.,/]#i’ is achieving.

Also thankyou for the prompt help.

.,/ - permit those chars
] - end of character class definition
#- end of the delimiter “#[load of rules]#”, I could use anything you usually see “/[load of rules]/”
i - the switch meaning ignore case, s = include new lines etc etc

Going back to the nub of your problem, is it the case that you want to take a title such as

“Roberts my mothers’ brother (in law)”

and turn it into a url-friendly string:

“roberts-my-mothers-brother-in-law”

Or are you trying to strip bad chars from an entire URL (http://www. etc etc)

The first one, create a URL friendly string.

Thanks for the explanation about preg_replace.

In that case you could be described as creating what is often termed a “slug” .

Search this forum for the words “slug” or “slugify” to find quite a bit of discussion on this matter, not only how to create them but how to store use them especially in tandem with Apache’s mod_rewrite.

Come back if you cannot find the discussions or have any questions.

Ah ok :slight_smile: Thank you for your help Cups I will go investigate Slugs.

Uhm…
http://php.net/manual/en/function.urlencode.php

Just saying…