XML declaration in HTML/XHTML

anon80411031 · July 17, 2011, 7:24pm

I’m reading a book that says that the following declaration


<?xml version="1.0" encoding="UTF-8" ?>

must be included if we are writing XHTML code, but shouldn’t be included in HTML code because it wouldn’t be correct. Why would it be wrong?

C_Ankerstjerne · July 17, 2011, 9:43pm

Because HTML isn’t XML.

HTML is based on SGML. SGML is an old standard, and is used to define markup languages for all sorts of purposes. XML is also based on SGML, but is completely seperate from HTML. HTML allows for a much wider variety of SGML features than XML does. XML’s main features are a very simple syntax and draconian error handling, which makes it suitable for marking up error-intolerant documents and also makes it very easy to learn.

During the XML craze of late 20th and early 21st century, someone has the idea of making an XML-version of HTML; XHTML. The idea was that, since XML was already being used a lot in various applications (such as vector graphics (SVG) and mathematical markup (MathML)), this would make it possible to easily extend the HTML page to natively include such additional elements.

The only problem was browser vendors and page authors. Page authors were told that XHTML was the new black, and that everyone had to start using it, or they’d miss the cultural revolution of Web 2.0. Nobody really knew why they were using XHTML rather than HTML, except everyone else was doing it. As a result, everyone did it wrong (and still do). At the same time, the browser vendors couldn’t exactly built their web browsers so that they were incapable of displaying XHTML pages, as long as their competitors didn’t do it as well.

The XML declaration is not necessary, unless you use XHTML 1.1 (which you shouldn’t). With HTML 5 supporting the general concepts that originally gave birth to XHTML, XHTML will die a slow and painful death over the next 20 years. Until then, HTML 4.01 is still the latest standard advisable to public websites.

anon80411031 · July 21, 2011, 6:10pm

Thank you for your reply!

Just a couple of things: why do you say that I shouldn’t use XHTML 1.1?
And why shouldn’t I add the XML declaration if I’m working with XHTML 1.0?

gary_turner · July 22, 2011, 2:21pm

I see that IE9 does render an xhtml page served as application/xhtml+xml. The question remains, is it really recognizing the content type header, or is it ignoring the server response header and parsing the document as html? I have neither the time nor energy to write a DTD extension to test the question.

cheers,

gary

gary_turner · July 21, 2011, 8:13pm

XHTML 1.1 should (must?) not be served as text/html. It requires application/xhtml+xml as the MIME type. No version of IE nor, sz MSFT, will any future version support xhtml. So, xhtml is effectively dead.

When using xhtml 1.0, the xml declaration is redundant unless you need a different declaration. The defaults are xml 1.0 and utf-8, so no need to specify. And that’s if you’re actually serving it up as xhtml. As html, which is what you serve if you want IE to render it, the declaration is invalid.

Use of the declaration will cause IE6 to run in quirks mode where it gets even more stupid than it naturally is.

cheers,

gary

anon80411031 · July 22, 2011, 7:05am

This is perfectly clear now. Thanks for your help

xhtmlcoder · July 22, 2011, 11:30am

In general, ‘application/xhtml+xml’ should be used for XHTML Family documents. With XHTML 1.1 it’s a SHOULD NOT serve as ‘text/html’.

When the XML declaration is not included in a document, AND the character encoding is not specified by a higher level protocol such as HTTP, the document can only use the default character encodings UTF-8 or UTF-16.

Apparently M$IE 9 is supposed to already support ‘application/xhtml+xml’ and when that Fred arrives it will have to support that MIME too, to comply. XHTML 1.0 Appendix C covers backwards compatibility with HTML user-agents. Fred is not backwards compatible.

C_Ankerstjerne · July 22, 2011, 4:36pm

Gary
Problem is not error tolerance, but error handling. Public documents need to be error tolerant. Otherwise, you end up with some CEO on the phone, screaming ‘Ohmygodohmygodohmygod, the entire website is down, we are loosing eighty billion dollars every second, fixitfixitfixitfixit, whyisn’titfixedyet, I’m going to sue you and your company and repossess your car and your house and your wife and your children’. At this point, I’m fairly certain that the explanation ‘No, chosing that technology didn’t really have any advantages, but methinks them end tags are really pretty, don’t you?’ just won’t cut it. And fact remains, unless you have some very specific needs, like embedded vector graphics or mathematical formulae, XHTML offers no inherent technological advantages.

D3V4
Correct. And since you should never use XHTML 1.1, you will never need the XML declaration

C_Ankerstjerne · July 22, 2011, 2:28pm

That’s easy to test: An XML parser must not attempt to recover or correct syntax errors, so simply upload an XHTML document with a syntax error. If Internet Explorer 9 stops rendering the page after the syntax error, then it handles the header correctly.

(This, by the way, is the exact reason why XHTML should not be used).

gary_turner · July 22, 2011, 2:54pm

I didn’t think of that; maybe because I never make syntax errors. I wish.

IE9 does stop rendering after the point of error, so it is apparently using xml. It does not issue an error message, at least not on the page itself. Opera has a sweet error message, with no rendering, and Chrome has an error message and renders to the error. Firefox’s message is rather cryptic.

Contrary to your feelings, my own are that html should have had little or no error tolerance from the beginning. Oh well. To each his own.

cheers,

gary

anon80411031 · July 22, 2011, 2:56pm

To summarize, I should add the XML declaration iff I use XHTML 1.1, right? ^^

gary_turner · July 22, 2011, 6:27pm

Are you speaking of file corruption? That’s a different issue. Because surely you wouldn’t go public with an error laden document, right? Yet, under historic and current error philosophies the web is full of documents that would embarrass a reasonably bright 10 year old. While we don’t blink an eye if a missing brace stops our script from running, or a malformed query to cause a DB failure, we’d get all bent out of shape if our POS markup wouldn’t render; and this in a markup language so simple that 10 year old can do it.

As for usage, your examples are good, but fairly uncommon; probably more suited to LaΤεΧ than (x)html. I’d suggest, as an example, that attribute-torturing microformats would have been totally unnecessary had xhtml been supported by IE. The same microformat working group could have been publishing xhtml DTD extensions that would have provided well structured groups of elements with high semantic value. Instead, we got elements whose intrinsic semantic values were over-ridden by attributes, such as class, rel, and rev using non-standard values. What a cluster-bleep.

gary

xhtmlcoder · July 22, 2011, 7:07pm

I think Christian was referring to beauty of XML, i.e. well-formedness ALSO being an Achilles heel regarding the XML Parser when considering; malformed x(ht)ml and ‘halt on error’ or ‘Yellow Screen of Death’ scenarios (mainstream browsers public facing websites).

I don’t have IE9 but I believe you have to change the settings slightly for it to spit-out errors for violations of well-formedness, etc.

The fact that some would say Browsers accept a lot of markup nonsense but still try to output a webpage is part of why the web grew exponentially.

C_Ankerstjerne · July 22, 2011, 7:38pm

Exactly right. HTML sets a very low entry level, which is good in terms of online diversity. Sure, the poorly written pages might not render just right, but this is much better than the alternative; that is, that half the websites online wouldn’t exist, because would-be webmasters gave up before they got started.

gary_turner · July 22, 2011, 7:50pm

I understand that. My point is, why would malformed markup be put on the server at all? If you’re going to take advantage of the power of xhtml, are you going to upload error laden source? I think not. If you upload documents with well formed markup, draconian error handling is not an issue; unless the file has become corrupted. And that’s a different issue.

I don’t have IE9 but I believe you have to change the settings slightly for it to spit-out errors for violations of well-formedness, etc.
Hmm, I may look into that. Or not. IE is not in use here except for x-browser checking.

The fact that some would say Browsers accept a lot of markup nonsense but still try to output a webpage is part of why the web grew exponentially.
Likely true. It’s just that the grammar bar for html is so low to begin with, there’s no excuse for html markup errors.

cheers,

gary

C_Ankerstjerne · July 22, 2011, 8:08pm

There can be all sorts of reasons why it happens. Someone updating something one place, that makes unexpected changes somewhere completely different, for instance. If there’s room for human error, it will happen, and quite often too.

gary_turner · July 22, 2011, 8:24pm

Christian, that’s not an xhtml issue, it’s an access control issue. Is it common for every Tom, Dick, and Harry to have editorial access to server files? I’ve only administered one intranet that made heavy use of xml and xhtml documents, so maybe mine was a minority case, but access to the files was strictly limited.

cheers,

gary

C_Ankerstjerne · July 22, 2011, 9:09pm

But what about large online companies like eBay, Amazon and Facebook? They have hundreds, if not thousands, of developers.

felgall · July 22, 2011, 9:36pm

If they can’t write valid error free code (even with feeding what they’ve written through a validator to find all the errors for them) then they shouldn’t be in that job.

C_Ankerstjerne · July 22, 2011, 11:04pm

You’re talking about an ideal world now. While I agree that it would be nice if all code was error-free, fact remains that there’s little reason to believe this will actually ever be the case.