XHTML vs HTML FAQ

AutisticCuckoo · June 16, 2006, 11:55am

Frequently Asked Questions About XHTML vs HTML

Table of contents

[list=1][]What is XHTML?
[]How is XHTML different from HTML
[]How is XHTML 1.1 different from XHTML 1.0?
[]What about XHTML 2.0?
[]Should I use XHTML or HTML?
[]Should I use XHTML 1.0 or XHTML 1.1?
[]Why do so many books and sites recommend XHTML over HTML?
[]Is XHTML supported by all browsers?
[]Is XHTML more strict than HTML?
[]Is XHTML more semantic than HTML?
[]Can you use CSS with both XHTML and HTML?
[]What does the XML declaration do? Should I use it?
[]How is the DOCTYPE declaration used?
[]Do I need the xmlns attribute in my <html> tag?
[]How does this MIME type thingy work?
[]Can I set the MIME type with a <meta/> tag?
[]How do I serve XHTML with the proper MIME type?
[]Is serving XHTML as text/html harmful?
[]What’s the benefit of serving XHTML as application/xhtml+xml?
[]Should I use content negotiation when serving XHTML?
[]Which character encoding should I use for XHTML?
[]Can I use XHTML files on my hard drive?
[]Can I use XHTML with Internet Explorer?
[]Will IE7 or IE8 support XHTML?[/list]

What is XHTML?
XHTML 1.0 is a ‘reformulation of HTML 4 as an XML 1.0 application’, according to the XHTML 1.0 specification.
In other words, it is an XML-based markup language that has the same set of element types and attributes as HTML 4.

How is XHTML different from HTML?
XHTML is fundamentally different from HTML, despite looking very similar.

XHTML is XML, which means that the syntax rules are slightly different.
There are things you can do in XHTML which you cannot do in HTML.
There are things you can do in HTML which you cannot do in XHTML.
There are differences concerning CSS.
There are differences concerning client-side scripting (e.g., JavaScript).

Differences in Syntax Rules

XHTML is case-sensitive, HTML is not. All tags and attributes must be lowercase in XHTML.
XHTML, being XML, must be well-formed. Every element must have an end tag, or use the self-closing tag syntax. HTML allows some end tags and even some start tags to be omitted.
If an XML parser encounters a well-formedness error, it must abort. An SGML or HTML parser is expected to try to salvage what it can and keep going.
All attributes must have a value in XHTML. HTML allows some attributes (e.g., selected) to be minimised.
All attribute values must be surrounded by double or single quotes. HTML allows quotes to be omitted if the value contains only alphanumeric characters (and some others).
The comment syntax is more limited in XHTML, but that’s rarely an issue for most designers/developers.

Things You Can Do in XHTML But Not In HTML

Use CDATA sections (<![CDATA[ … ]]>). That’s useful if you have content with lots of literal characters that otherwise need to be escaped.
Use PIs (processing instructions), e.g., to link to a style sheet:
<?xml-stylesheet type="text/css" href="style.css" media="screen"?>
Include elements from other XML namespaces (see below).
Use the ' character entity.

Things You Can Do in HTML But Cannot Do in XHTML

‘Hide’ the contents of style or script elements with SGML comments ().
Create parts of the page dynamically with JavaScript while the document is still loading (e.g., using document.write()).
Use named character entities (e.g.,  ) other than the four predefined ones: <, >, & and ".
Use the .innerHTML property with JavaScript (technically this is non-standard even in HTML).

Differences Concerning CSS

Element type selectors in CSS are case sensitive for XHTML, but not for HTML.
In HTML, the properties background-color, background-image and overflow on the BODY element will be applied to the root element (HTML) unless specified for that element also. That is not the case for XHTML.

In HTML some start tags are optional, but the element node exists in the document object model even if the tags don’t occur in the markup. If we want to style header cells in the table body, we might use a CSS rule like this one:

tbody th {text-align:left}

In HTML, this will work even if we omit the <tbody> and </tbody> tags in our markup, because the TBODY element will be created anyway. That will not work in XHTML; unless we have explicit <tbody> and </tbody> tags, the selector will not match.

Differences Concerning JavaScript

document.write() cannot be used with XHTML (see Why document.write() doesn’t work in XML)
DOM methods like createElement() must be replaced by their namespace-aware counterparts (createElementNS() etc.).
The non-standard .innerHTML property should not be used for XHTML documents.
The same issues with implicit elements that occur for CSS also apply for JavaScript.

How is XHTML 1.1 different from XHTML 1.0?
XHTML 1.1 is a reformulation of XHTML 1.0 Strict using Modularization of XHTML, which simply means that the definitions of the various element types have been separated into a number of modules.

XHTML 1.1 deprecates the lang attribute (in favour of xml:lang) and also the name attribute for <a> and <map> tags. It also adds a number of elements for Ruby annotations.

What about XHTML 2.0?
What about it? It shows no signs of becoming even a candidate recommendation any time soon. We don’t know what it will contain, but it seems as if it is not going to be backwards compatible with XHTML 1.0.

Should I use XHTML or HTML?
That depends on who you ask. There are a number of technical issues with this question, which preclude a simple and short answer. In reality, the latest W3C recommendation with widespread support is HTML 4.01. Unless you actually need any of the features that XHTML offers over HTML, there is no technical reason to use XHTML.

In order to actually benefit from using XHTML, you really need to understand the fundamental differences between XHTML and HTML. Such a site will only be available to a small minority of the surfing population, however.

Some web designers and developers prefer XHTML’s syntax rules over HTML’s. By following certain guidelines, you can use this syntax without technically using XHTML at all (see below). There are a number of potential pitfalls with this approach, but it is a possible way forward for those who absolutely want to type <br*/> instead of <br>.

For ‘future-proofing’ your documents, using a Strict doctype is more important than whether you use XHTML or HTML.

Should I use XHTML 1.0 or XHTML 1.1?
Unless you need to use Ruby annotations, and your target audience can be expected to have the required plug-ins for that, you should not use XHTML 1.1.

In particular, if you are serving your XHTML markup as text/html (see below), you must not use XHTML 1.1. Since it deprecates the lang attribute, it is not backwards compatible with HTML and must not be served as such.

Why do so many books and sites recommend XHTML over HTML?
When the XHTML 1.0 specification was released, many designers and developers were quite excited about it. It was XML, which was all the rage back then, yet could be used as if it were HTML, and it ‘worked in all browsers’. People saw countless possibilities with the extensibility mechanism, and when W3C stated that there would be no more versions of HTML, XHTML was seen as the future-proof alternative.

Eventually some less palatable aspects of using XHTML were uncovered and the extensibility myth was debunked, but this didn’t receive quite the same amount of publicity. Many authors thus still advocate XHTML over HTML out of ignorance or because of personal preference.

Is XHTML supported by all browsers?
No. Only a few mainstream browsers support XHTML, like Opera, Firefox and Safari.

Most importantly, Internet Explorer does not support XHTML at all.

If you follow certain guidelines you can serve XHTML documents as text/html (see below). That means the document will be seen as HTML, which all browsers can handle. Virtually all browsers have a parser bug that ignores the slash in self-closing tags.

Is XHTML more strict than HTML?
No. The syntax rules of XML (and thus XHTML) are simpler and more consistent, but both XHTML and HTML can be parsed unambiguously as long as the markup is valid.

Is XHTML more semantic than HTML?
No. XHTML 1.0 is just a reformulation of HTML 4.01. It contains the same elements and attributes and comes in the same three flavours (DTDs). There is no difference in semantics.

Can you use CSS with both XHTML and HTML?
Yes. You sometimes see preposterous claims that CSS can only be used with XHTML, but that is just disinformation. The first CSS specification came out in 1996, four years before XHTML.

What does the XML declaration do? Should I use it?
The XML declaration (sometimes incorrectly called the XML prologue) looks something like this:

<?xml version="1.0" encoding="utf-8"?>

It tells an XML parser that the document is an application of XML 1.0 and which character encoding it uses. If the encoding is anything other than UTF-8 or UTF-16 you must use the XML declaration, unless the web server sends encoding information in its HTTP headers. Even if it does, you should use the XML declaration, so that the right encoding is specified even if the document is saved to disk and opened locally.

This applies when XHTML documents are served as such. When served as text/html, the XML declaration should be ignored, but some old HTML-only browsers can choke on it. In particular, Internet Explorer 6 will render the document in quirks mode if there is an XML declaration before the DOCTYPE declaration. In these cases, you should omit the XML declaration, since the document is not treated as XML anyway.

How is the DOCTYPE declaration used?
One may be led to believe that the DOCTYPE declaration at the top of the document is what tells the user agent that it is an XHTML document. However, this is not the case. The original purpose of the DOCTYPE declaration only had to do with markup validation. A validator needs to know against which document type definition (DTD) to check for compliance. Browsers don’t use validating parsers, because there is no need, so they used to ignore the DOCTYPE.

When IE5/Mac was launched, it had a novelty feature: doctype switching. Its support for web standards was a major improvement compared to older version, and compared to its contemporary cousin on the Windows platform. In order to provide good standards support and still avoid breaking the millions of web sites that were written to accommodate IE’s incorrect CSS rendering, the DOCTYPE declaration was used to make an educated guess as to whether the document was ‘modern’ or ‘old-school’. This feature was then included in IE6/Win, and can now be found in most modern browsers.

So the DOCTYPE declaration serves two purposes: it tells a validator aginst which DTD the document claims conformance, and it is used by browsers to determine the rendering mode to use. It has absolutely nothing to do with the XHTML vs HTML issue, however. Browsers that support XHTML use the ‘strict standards’ rendering mode for XHTML documents, provided that they are served as such.

Do I need the xmlns attribute in my <html> tag?
Yes. That is what tells user agents that the document is, in fact, XHTML, rather than any other application of XML. If the xmlns attribute is missing, or doesn’t contain the right value, the markup will not be recognised as XHTML. The attribute is invalid in HTML, and will thus be ignored if the document is served as text/html. The correct value to use is

xmlns="http://www.w3.org/1999/xhtml"

Namespaces in XML allow us to use the same element type name for different elements. For instance, a fictive WidgetML markup language can use an element type called label. By declaring a separate namespace for WidgetML, we can use those label elements in an XHTML page, even if that contains a form with label elements, and the browser will have no problem keeping them apart.

An XML namespace is bound to a URI. The XHTML namespace mentioned above is one example. If we want to include WidgetML elements throughout our XHTML document, we can use a prefix and bind the WidgetML namespace to that:

<html xmlns="http://www.w3.org/1999/xhml"
      xmlns:w="http://example.com/ns/widgetml"
      xml:lang="en">

This binds the prefix ‘w’ to the WidgetML namespace. To separate XHTML’s label elements from WidgetML’s, we use the prefix in our tags: <w:label>Blue Widget</w:label>.

How does this MIME type thingy work?
When a resource is requested via the HTTP protocol, the web server sends an HTTP response consisting of one or more headers, a blank line (CR+LF) and the document body. For a web page, the body is the HTML or XHTML document, i.e., the markup we write.

The HTTP headers provide meta-information about the document. One of the most important headers is Content-Type, which informs the user agent what type of content the response body contains. It may also convey information about which character encoding it uses. For HTML, such a header can look like this:

Content-Type: text/html; charset=iso-8859-1

The text/html part consists of a MIME media type name (text) and subtype name (html). The charset part is an optional attribute.

According to RFC 2854 this MIME type identifies the content as HTML, which means that user agents must parse and interpret the contents as HTML. Even if it’s actually a Microsoft Word document, a GIF image … or an XHTML document. In other words, if the MIME type is text/html, the document is HTML (not XHTML).

This means that no XML-only features can be used. It also means that HTML-only features can be used, but doing so defies the purpose of using XHTML markup in the first place.

There are three MIME types that we can use for XHTML documents, which will make compliant user agents recognise the document as XML:

application/xhtml+xml (recommended)
application/xml
text/xml (not recommended)

The recommended MIME type for XHTML is application/xhtml+xml, which is defined in RFC 3236. Note, however, that this is not supported by any version of Internet Explorer at this time (June 2006).

Although it is possible to use text/xml (defined in RFC 3023), it is not recommended due to the odd way the default character encoding is specified. With this MIME type, the encoding must be sent in the HTTP header; it cannot be overridden by an XML declaration. It also defaults to the not-very-useful encoding US-ASCII.

Can I set the MIME type with a <meta/> tag?
No. A user agent needs to know the content type before it starts parsing the response body. When it encounters an element like this, it’s already too late:

<meta http-equiv="Content-Type" content="application/xhtml+xml; charset=utf-8"/>

The MIME type must be sent as a Content-Type HTTP header. The character encoding should be specified in the XML declaration (see above).

How do I serve XHTML with the proper MIME type?
You need to instruct your web server to send the proper HTTP header. The exact technique depends on which HTTP server you use, but for Apache, you can use the AddType directive:

AddType application/xhtml+xml .xhtml .xht

This makes Apache send an application/xhtml+xml MIME type for files ending with .xhtml or .xht. The directive can be put in the global configuration file (/etc/httpd.conf on most *nix systems) or in a local .htaccess file in a directory.

Sometimes we don’t have access to the configuration files, but we need not lose hope quite yet. If we can use a server-side scripting language, we can send the HTTP header ourselves. For instance, using PHP:

header('Content-Type: application/xhtml+xml; charset=utf-8');

(Note that this header must be sent before a single byte of document content is written to the response stream.)

Is serving XHTML as text/html harmful?
In 2002 Ian Hickson published an article labelled Sending XHTML as text/html Considered Harmful. It has been criticised by many XHTML proponents, but it should be required reading for anyone who is going to use XHTML markup.

Serving XHTML documents as text/html is not necessarily harmful, if you know what you are doing and are aware of the fundamental differences between XHTML and HTML. Relying on HTML-only techniques, however, is ‘harmful’, because that means that a purported XHTML document will not work as XHTML.

Thus, if you are going to serve XHTML documents as text/html, you must make sure that they also work as intended when served as application/xhtml+xml.

You must also make sure to follow all the guidelines in Appendix C of the XHTML 1.0 specification. Although this appendix isn’t normative, it offers guidelines for maintaining the compatibility that is required for serving XHTML markup to user agents that only support HTML. Including a blank character before the /> in a self-closing tag, for instance, is necessary to avoid confusing some old browsers. For XHTML served as XML, no such space is necessary since all XML parsers understand self-closing tags.

What’s the benefit of serving XHTML as application/xhtml+xml?
That the document is recognised as XHTML by user agents.

Presumably, you are using XHTML for a reason. Unless the document is recognised as XHTML, you cannot use any of the features XHTML offers over HTML.

Should I use content negotiation when serving XHTML?
Content negotiation means examining the Accept HTTP header sent by user agents and serving different content types to different user agents. For instance, Opera, Firefox and Safari state that they support application/xtml+xml, so they would receive XHTML markup with that MIME type. Meanwhile, browsers like Internet Explorer, would receive HTML markup served as text/html.

There is currently no point in doing this, other than to impress other computer geeks with your knowledge. If the content can be transformed into HTML it doesn’t require any XML features. You might as well use HTML 4.01 or serve XHTML as text/html to all user agents.

It is especially useless to do content type negotiation, i.e., sending the same content to everyone, but sending different Content-Type headers to different browsers.

Which character encoding should I use for XHTML?
XML parsers are only required to support UTF-8 and UTF-16. If you use anything other than that, there is no guarantee that the parser can interpret the document correctly. In reality, browsers generally seem to support the same range of encodings as for HTML, but if you want to be on the safe side, stick to UTF-8 or UTF-16.

Can I use XHTML files on my hard drive?
When a web page is opened from the local hard drive, there is no HTTP server involved to send the proper HTTP headers. The file extension is then often used to make an educated guess about the content type. Opera and Firefox will assume an XML content type for files ending with .xhtml, .xht or .xml.

This will not work for Internet Explorer, of course, since it doesn’t support XHTML.

Can I use XHTML with Internet Explorer?
No. Not really.

IE does not support the application/xhtml+xml MIME type, and will prompt the user to download the page if it’s served as such. You can make IE recognise this MIME type through a registry hack, but it will still treat it as HTML.

If you need the XML features of XHTML, you can serve the document as application/xml. That is supported by IE, but XHTML’s namespace is not, which means IE will see the document as generic XML. There will be no default style sheet, so you have to specify explicit rules for every element type (including display:block for all block-level elements).

You can, of course, serve XHTML markup as text/html, but as has been mentioned above that means the document will be seen as HTML with syntax errors.

Will IE7 or IE8 support XHTML?
No.

Shyflower · June 16, 2006, 12:48pm

Excellent Tommy! Thanks! :tup:

stymiee · June 16, 2006, 12:59pm

Stickied! :tup:

system · June 16, 2006, 1:07pm

Thanks for this, Tommy.

Enoch_Root · June 16, 2006, 1:48pm

Hi Tommy,

It has been criticised by many XHTML proponents

Who,where, and why?

Smashing FAQ by the way.

chris_fuel · June 16, 2006, 1:52pm

Yah! A centralized place to point some of my friends that keep bringing up proposterous myths about this

system · June 16, 2006, 3:39pm

This goes for Safari (well, I checked on 2.0.3) as well, except it doesn’t seem to recognize (or open) files with the .xht extention.

Bleys · June 16, 2006, 3:49pm

Nice work, Tommy.

AutisticCuckoo · June 16, 2006, 4:01pm

One that leaps to my mind is Faruk Ateş, with whom I’ve debated the pros and cons of XHTML a few times.

[thread=251158]Use XHTML or HTML 4.01 Doctype?[/thread]
[thread=223647]Why XHTML? What is the point?[/thread]

Thanks, Egor. :tup: I suspected that, but since I don’t know much about Safari I didn’t want to make any statements that might be untrue.

system · June 16, 2006, 4:09pm

No worries. If you’d like me to do some extensive testing + screenshots, you know where to find me.

zcorpan · June 16, 2006, 5:52pm

I think Safari doesn’t include application/xhtml+xml in its Accept header (it only states that it supports /).

system · June 18, 2006, 3:37pm

So in other words there is no point in using xhtml unless you want to be more uncompatible?

Enoch_Root · June 19, 2006, 9:21am

I can’t leave this subject alone!

I’ve referred my colleague to these issues and he is dismissive - he’s happy to carry on using XHTML served as text/html, and is “fairly sure” we have followed the compatibility guidelines (Even though I’m fairly sure he hadn’t heard of them until I brought them up). I’m fairly unsure we, the developers, have followed the guidelines, and I’m utterly convinced that users adding content with Contribute wouldn’t have a clue what XHTML was, never mind stick to the guidelines. I guess there will come a time when it will all blow up in our faces - presumably this needs:[list=1]
[]Servers to serve XHTML as application/xhtml+xml; and
[]All (major) browsers to recognise it
[/list]
When is that likely to be? The reality of serving xhtml as text/xml, then, is that we have to try and keep on top of these issues, and regularly validate our pages, and make sure we never lose sight of this issue. In short we are putting added pressure on ourselves - if we actually care, that is. I think we certainly should care. But people are going to look at this as a “maybe” issue, for some time in the future, if and when the two conditions above are met. And they won’t see it as a major issue. Do we try to convince them, or just take the opportunity to act like the big hero when it all goes pear-shaped and you’re the only guy in your organisation who knows why?

AutisticCuckoo · June 19, 2006, 9:56am

It’s going to be several years before ‘all’ major browsers recognise XHTML, even if Microsoft launches an XHTML-compliant IE8 next year. They tend to require the latest versions of Windows for their ‘upgraded’ versions, and lots of users have neither the means nor the wherewithal to replace Windows 98 with XP or Vista. Depending on your target audience, you’ll probably have to expect a significant share of users with non-XHTML-compliant browsers.

There is absolutely no point in writing XHTML markup if the document doesn’t work when served as an application of XML (to compliant user agents). If you need to serve it as text/html for the foreseeable future, that’s fine. Just make sure that it also works as XHTML. It’s easy to achieve: provide a mechanism for serving it with an XML MIME type and verify in a compliant browser like Opera or Firefox.

If it doesn’t work, you’re relying on HTML-only features and should use HTML markup. Even the ‘future-proofing’ argument loses all validity under those circumstances.

Enoch_Root · June 20, 2006, 8:33am

Are you saying we should use content negotiation, or are you saying something else?

AutisticCuckoo · June 20, 2006, 8:55am

No, as I said in the FAQ, content negotiation is pointless. Use HTML 4.01 Strict, or use XHTML 1.0 Strict served as text/html. In the latter case, make sure to verify that it still works when served as application/xhtml+xml, even if you serve it as text/html to all user agents.

ganseki · June 20, 2006, 4:01pm

I’ve just started to follow this topic and thought i’d see what kind of results I got by ‘converting’ something over to application/xhtml+xml; http://www.seikadojo.co.uk

I opened it in FireFox and thought wow it works… cool! Next try IE and see how badly its broken. Very surprised to see it looking OK. So I ran it through the W3C validator and its reporting that its text/html

I’m not too bothered, at this stage, about content negotiation as I mostly want to see how much work it will take to ‘fix’ a site. I have looked at the PHP scripts on autisticcookoo.net (wicked name btw!!) and I’m sure I can convert that to ASP when I need to.

But first how do i get my site to be served as application/xhtml+xml??

zcorpan · June 20, 2006, 6:09pm

With ASP you could use:

<% Response.ContentType = "application/xhtml+xml" %>

Jak · June 20, 2006, 6:53pm

Awesome thread, thanks Tommy. :tup:

kosta · June 20, 2006, 7:28pm

Wow, great FAQ Tommy! Thank you so much