Regular expression - matching meta content-type

Hello,

I have the following code for matching meta content-type (charset):

preg_match( '@<meta\\s+http-equiv="Content-Type"\\s+content="([\\w/]+)(;\\s+charset=([^\\s"]+))?@i', $html, $matches );
	//if ( isset( $matches[1] ) ) $mime = $matches[1];
	print_r($matches);
	if ( isset( $matches[3] ) ) {
		$charset = $matches[3];
		return $charset;
	}

This won’t match the following html:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><meta http-equiv="Content-Type" content="text/html; charset=utf-8" /><title>
	Det andet Andalusien - Solferie - Livsstil - Nordjyske.dk
</title><link href="/css/nj_style.css" rel="stylesheet" type="text/css" media="all" />

Why is this?

What do the @ symbol mean with regex (couldnt find the info on the net)?

Many thanks for help!

I don’t think that the @ symbol has an special meaning for regular expressions.
Normally the forward slash is used at the start and end of regular expressions, instead of the at symbols that are seen there.
See http://php.net/manual/en/function.preg-match.php

I just tested the regex code itself works correctly.

I think your problem might be with the array indexes of your $matches variable.

Array indexes start at 0, rather than 1.

If you’re trying to get the meta tag’s attribute values, you may have to look at regex groups/grouping.

Are you sure that it will not match the sample HTML that you provided? Using your sample code:


$html = '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><meta http-equiv="Content-Type" content="text/html; charset=utf-8" /><title>
	Det andet Andalusien - Solferie - Livsstil - Nordjyske.dk
</title><link href="/css/nj_style.css" rel="stylesheet" type="text/css" media="all" />';

preg_match('@<meta\\s+http-equiv="Content-Type"\\s+content="([\\w/]+)(;\\s+charset=([^\\s"]+))?@i', $html, $match);
var_dump($match[1], $match[3]);

Outputs:

[indent]string(9) “text/html”
string(5) “utf-8”
[/indent]