HTML 4.01 Transitional DTD is the answer?

system · July 10, 2010, 10:01am

i just want to lay down a little thing that bugs me. this thread (among others by AutisticCuckoo) provides us with healthy knowledge. i want to say thanks to the author again. he has done a great job laying down crystal clear facts for us.

in this thread, there are two paragraphs:

The HTML 4.01 Strict DTD emphasises the separation of content from presentation and behaviour. This is the DTD that the W3C recommend for all new documents.

The HTML 4.01 Transitional DTD is meant to be used transitionally when converting an old-school (pre-HTML4) document into modern markup. It is not intended to be used for creating new documents.

i want to ask this:

even if we respect DTD for HTML 4.01 Strict in writing the code for a web page, the moment we start adding elements only as presentational hooks, like div, span, or any other element for that matter, but solely for the sake of presentation, just to be targeted later on with css (rounded corners techniques and others alike), aren’t we, in fact, writing a HTML 4.01 Transitional DTD web page? even if we don’t use those “11 presentational element types and a plethora of presentational attributes that are deprecated in the Strict DTD”, isn’t our code in fact describing a Transitional DTD?

going further on with this thought, isn’t this also affecting XHTML world the same way? with all that strictness, the moment an XHTML markup is polluted with divs and spans as presentational hooks, then can it really even be called on as XHTML? not technically speaking, of course, but as a concept, is it still holding it’s meaning?

system · July 22, 2010, 3:53pm

it does. how much can you abuse other xml applications? can you make browsers treat svg differentiated, based on a parameter? how it is used changes that: by UAs, by authors. the specification it’s only a recommendation, and that makes xhtml not xml.

and of course i’m not referring to the way you use it i’m sure you are a better web programmer than i am.

Mittineague · July 22, 2010, 3:50pm

The point I’m making is that XHTML is XML
from A Reformulation of HTML 4 in XML 1.0

This specification defines the Second Edition of XHTML 1.0, a reformulation of HTML 4 as an XML 1.0 application, and three DTDs corresponding to the ones defined by HTML 4. …

How it might be used, misused or abused doesn’t change that.

system · July 22, 2010, 3:31pm

xhtml it will be xml when it will be treated and parsed always as xml. it’s not xhtml because it can be parsed as xml and as html also, is it? it’s xhtml because

isn’t it?

would it be treated like one if served as application/xhtml+xml?

true. i don’t see where this is going, though, related to xhtml.

Mittineague · July 22, 2010, 3:23pm

I purposely bugged my index page (for a few seconds) by taking out a closing unordered list tag to get this in Firefox

Note that it doesn’t say XHTML error.

I don’t understand how you can say it isn’t XML when clearly it is.

Sure, I could remove the opening declaration and then it wouldn’t be XML.
But I could remove the opening declaration from an XML file too and then that wouldn’t be XML either.

system · July 22, 2010, 1:47pm

and also theoretically xhtml would still be good while xml will not (see below). so maybe it’s conforming but it’s not xml.

no, that would be a string that resembles xml. and xhtml sent as ‘text/html’ still translates to something rather than a string: it can be parsed, doesn’t need escaping. that’s why i’ve said this before felgall’s example:

many (and i was, at first, one of them) understand xhtml like a UA has to, as a two lane highway: text/html or maybe, if i want, application/xhtml+xml. well, it’s not quite right. it’s a one way highway. that and the common wrong use, makes the term xhtml not to be associated with xml. it’s just a mongrel. and i believe by the time ie8 dies, xhtml would have lived its days as a favourite pet. that doesn’t affect its presently insignificant usability compared with its initial higher purpose.

xhtmlcoder · July 22, 2010, 1:18pm

It is an application of XML a reformulation of HTML thus all XHTML documents are XML conforming.

Yes, they should begin with an XML declaration and it uses XML type grammar and a namespace and is at bare minimum well-formed markup. Ideally it should be sent through an XML Processor but if it gets sent as ‘text/html’ then obviously it won’t halt on errors and isn’t treated as x(ht)ml.

<?xml version="1.0" encoding="UTF-8"?>
<something>
  <p>
    <img src="example.png" alt="*"/>
  </p>
</something>

But theoretically you could do the exact same with XML (code example) send it as ‘text/html’ obviously completely pointless but possible.

system · July 22, 2010, 12:29pm

thanks xhtmlcoder. that’s the point i was getting at

xhtml is treated like xml by a xml parser. but you can choose to parse it as html; when invalid, when desired. and the <?xml…> can be ommited. you can’t fool around like that with xml in a real world application. so xhtml it’s not xml.

xhtmlcoder · July 22, 2010, 9:22am

Well, that is easy just get a well-formed XHTML document and delete a closing tag for example the closing in Stephen’s ‘example page’ then serve it as: application/xhtml+xml.

Else you may be able to “cheat” if you have a XHTML Compatible browser like Firefox by changing the extension to *.xht and viewing it offline. It will probably assume it is a x(ht)ml document and “try” and render it as such… but of course it won’t be able to render the page without it displaying a “warning”.

For example you may see something like:

XML Parsing Error: mismatched tag. Expected: </p>.
Location: file:///C:/realxhtml.xht
Line Number 7, Column 3</body></html>
--^

Because it uses a real XML Processor.

system · July 22, 2010, 8:11am

thanks for info and link, felgall. could you please provide also a link to a “broken” xhtml? i would like to see how it renders (or not) in browsers that support application/xhtml+xml. thanks.

felgall · July 22, 2010, 7:11am

Yes they are since one error in an XHTML page will cause the page to not render - because XHTML is XML and applies all the same rules with the addition of those that apply to the specific variant of XML that goes by the name XHTML.

You are getting it confused with HTML which does allow you to get away with errors in the markup such as omitting tags or using the wrong doctype (such as attaching an XHTML doctype to your HTML web page).

To determine whether a web page is HTML or XHTML you look at the MIME type - if it is text/html then it is HTML and if it is application/xml+xhtml then it is XHTML.

All modern browsers with the exception of IE8 (and earlier) can process web pages written using XHTML (including IE9). If you try to serve a page written in XHTML to IE8 it will offer it for download because it doesn’t know how to display it. Make a single mistake in your XHTML markup and because it follows XML rules you end up with nothing displaying at all.

See http://www.felgall.com/realxhtml.php for a very simple page written using XHTML and you will see exactly how different browsers treat it.

system · July 22, 2010, 5:26am

it probably is. depends on the programmer. while you can get away with lousy xhtml, if you use xml for other than a web page, you cannot make mistakes, or you’ll be punished swiftly. that’s why xhtml does not equal xml. an invalid xhtml document will always be used, while an xml malformed document will never be usable. these two are not synonymous.

and i don’t know if every xml document out there has an <html> tag in it xhtml declares to be xml, but nobody cares much if it is so.

felgall · July 12, 2010, 10:04pm

Perhaps what the OP is actually looking for is an HTML LINT. That will do a massive cleanup on their HTML 4 strict code to make it look more like a semantically correct web page is supposed to look.

xhtmlcoder · July 12, 2010, 7:43pm

HTML Lints will generally “trim” completely plain empty content elements, i.e. and so-forth but obviously if they have an attribute value they will be left intact.

AutisticCuckoo · July 22, 2010, 5:04am

Because that’s all it’s asked to do. But it could also use its built-in knowledge about headings to generate an automatic table of contents (some screen readers can do this, I believe). The point is that it ‘knows’ a closed set of element types and there is no way, except plug-ins and similar, to extend that set.

Sir Tim’s dream about a semantic web relies on user agents ‘understanding’ the markup. Homegrown tag names will not make that possible.

Without semantics a web page is nothing more than a picture to look at. It has no meaning for applications which cannot interpret the visual information.

It isn’t necessarily an address, but it is – at least it’s supposed to be – contact information. Browsers ‘know’ this and could – in the ideal world where web authors knew what they were doing – use this to provide a means for getting in touch with the author. They couldn’t do that if you chose to call it <postadress> and I chose to call it <contact> and Gary chose to call it <getintouch>, though.

Element types must be pre-defined and known to be useful. Web semantics is about making information meaningful to software, not about using markup that is human-readable.

Yes, that’s exactly the problem: we’d have to tell the application what <city> and <street> means. To a general-purpose UA, such as a browser, it’s nonsense. If we use  with class attributes, however, we give even browsers at least a hint about what it is.

Microformats is a way to add something approaching semantic meaning in a way that doesn’t break backwards compatibility, which adding new element types would do.

No!
<city> has no semantics to a browser because it’s completely unknown. It could equally well be <town> or <stad> or <cité> or <fztwqln>.

Whereas  at least tells the browser that it’s a span of characters that has some sort of special meaning.

Yes, but semantics is not about making markup easy to read for people!.

Now you’re being silly, Gary. The UA style sheet is an integral part of the browser. Even Lynx has one, after a fashion.

So tell me how an XHTML-supporting browser (plus a search engine 'bot and a screen reader) would be able to understand arbitrary homegrown markup. You can’t specify semantics in a DTD, only grammar.

That is where you are wrong, Gary. The markup tells user agents what the content is; not human users. <city> isn’t semantic because an English-speaking person understands the word.

I’ll say it again, since it seems that this is a point that a lot of people are missing:

Web semantics is about conveying meaning to software, not about writing markup that is self-documenting for human beings.

I know it’s hard to believe for us professionals, but most users out there actually don’t even look at the source code of most sites they visit!

Mittineague · July 21, 2010, 11:58pm

I was under the impression that XHTML is XML.

The first line my pages serve to browsers that HTTP_ACCEPT application/xhtml+xml after the HTTP headers are sent is:

<?xml version='1.0' encoding='utf-8'?>

system · July 21, 2010, 9:44pm

i beg to differ. the real power of xhtml is xml. and xml could be easily put to work if <xml> element would’ve been implemented across browsers. much easier than three (useless) whole new specs, each with its problems.

an example of a possible xml island, related to one presented earlier:

<xml class="address">
<name>Joe</name>
<streetaddress>Hondenpoeplaan 5</streetaddress>
<postalcode>1234 GG</postalcode>
<city>Borculo</city>
<country>People's Republic of Foo</country>
</xml>

sounds a whole lot better than the redundancy employed by the false extensibility of xhtml. you can wrap pretty much every xml sequence in there, and still get away with semantics. the content of an element has little to do with that. once wrapped between <xml> tags, the content stops being about tags defining elements, and the rest being the content. all that is now content.

given the right implementation, to work around those tags inside xml islands, and doing something on the line of <dl> element to provide two types of subelements describing the xml element (xml-dt, xml-dd): info for the name and use of the custom tags used, and the actual xml sequence, this is something i was looking since four years ago and hoped until this day that it will become real. still do. and it’s better for me than xhtml.

on the other hand, a new borne breed of elements that can be tagged with any mambo-jumbo string, will puzzle you useless, giving you a sense of overwhelming. and 90% of the time only looking to carry data for lazy programmers looking for shortcuts. or confusing vocabulary with lexical elements.

system · July 12, 2010, 12:06pm

this is probably the most flip-the-coin-under-the-table technique i had to counter this far. when i say divs and spans are to give structure to the document, i get a counter saying these are semantically neutral elements that can be used for presentation. when i say that these elements can be used (only, in my opinion, for strict) as simple presentational hooks, i am reminded that they are in fact for giving structure to the document.

i understand specs: div and span give structure and provide with a neutral container than can be used for presentational purposes without affecting the semantics of the document.

the weak point i see when one uses divs and spans: specs should have said something on the line of :

We discourage authors from using empty P elements. User agents should ignore empty P elements
that was a bad common technique, that was spotted, using s the wrong way, and was amended in the specs. if the use of empty, nested empty, neighboring empty divs and spans was a common technique for achieving presentational feats only, outside the content concern, at the time this specs were written, i’m sure something similar along the line of empty s would have appeared in them.

you are right, validators can’t decide over semantics. so that’s why it’s our job to further decide where simple tools fail. but i’m pretty sure that these markup constructs, like certain rounded corner techniques or image replacement techniques, can be easily spotted by a validator and marked as warnings, if nothing else. i get that transitional DTD for documents with such constructs may appear to some as an extraordinarily bad decision. but my hole point is that we constantly need to reasses what specs were saying then with what it’s happening in the present days, along with the future prospects, because a transition is waiting for us in the near future, and it involves, among others, valid strict markup that is in fact a presentational only construct, construct that can/will be done with css. the future is not limited to html5+css3. and what yesterday seemed a permanent solution to a problem, today it’s only a transitional way to resolving it. an adjustment is in place. and i, for one, felt the need to point that out, in a way that i saw feet to put trough the grinding machine of other more experienced minds, using sitepoint forum, to help me discover valid pros and cons.

AlexDawson · July 12, 2010, 9:53am

Well you’ve had the Dutch kitty so I guess it’s my turn to draw swords

I understand exactly what you’re trying to imply throughout the thread and to some extent I agree with you that adding semantically redundant code (such as divitis, spanmania or classitis) does reduce the quality of the code in terms of being clean and tight, however the thing you seem to have missed is that when DIV and SPAN are used (even when their done in excess) it doesn’t qualify as transitional markup because removing extrenous markup isn’t part of what the DTD is about. The W3C validator and the semantic rules for appropriate use of code simply don’t have the contextual awareness to be able to validly say that you’re either using too much or too little code, or invoke best practices that you are using too much or too little to represent your content - that is something which is down to the nature of the content and the design. To imply the Transitional Doctype should cover such relevance would be impossible to maintain or police. Transitional by definition is to ease the transition from one standard (such as outdated practices like accessibility poor frames or style within structural element usage) in favour of new ones, not to dictate to people what may be classed as situational semantics (as best practices governing when and where elements should be used are very hard to define).

Because DIV and SPAN by default have no strict policy as to what kind of content they should be used within, that gives them added flexibility in that they represent structural boundaries of content with no specified semantic meaning. While it’s easy to say that such elements are inherently stylistic because they don’t hold semantic value, consider the case of Microformats… where semantic relevance is given where no appropriate element exists (such as marking up a vCard) where DIV’s and spans are wrapped around a certain specified piece of textual importance (like a persons name) where no <name> element exists to represent it (and it holds no importance beyond how it could be interpreted when reading). This totally counteracts your claim that such usage would specify stylistic attribution alone, elements with no semantic value CAN be given semantic relevance in certain contexts, they are not transitional just because their being used outside of what the specification declares or the syntax of the DTD recognises (they’re just being given additional relevance). Transitional code is about eliminating now-redundant functions which have been updated with better methods (like CSS which better describes and provides stylistic attribution), it’s NOT about stating that poorly structured code thereby violates some sort of doctrine of hierarchy or context within the content. And their not always just wrappers for stylistic code.

system · July 12, 2010, 8:47am

ok Stomme poes, you’re not helping me at all

let’s me start over, from here.

What is the difference between Strict, Transitional and Frameset?
The difference is which element types and attributes they declare, and how they allow or require element types to nest.

this one proves me way wrong.

The HTML 4.01 Strict DTD emphasises the separation of content from presentation and behaviour. This is the DTD that the W3C recommend for all new documents.

i may have a point for the first part, about separation of content from presentation. the second part however it’s pretty definitive.

The HTML 4.01 Transitional DTD is meant to be used transitionally when converting an old-school (pre-HTML4) document into modern markup.

i may have a point here if i lose (pre-HTML4). this is where i think we need an adjustment in thinking: modern markup.

It is not intended to be used for creating new documents. It contains 11 presentational element types and a plethora of presentational attributes that are deprecated in the Strict DTD.

again, pretty definitive, if you don’t think about what modern markup means today as opposed to what modern markup meant back then.

Which DOCTYPE should I use?
If you are creating a new web page, the W3C recommend using HTML 4.01 Strict.

If you are trying to convert an ancient HTML 2.0 or HTML 3.2 document to the modern world, you can use HTML 4.01 Transitional until you have managed to transfer all presentational issues to CSS and all behavioural issues to JavaScript.

if i reformulate this, thinking in modern markup terms:

Which DOCTYPE should I use?
If you are creating a new web page, the W3C recommend using HTML 4.01 Strict.

If you are trying to convert an ancient HTML 2.0 or HTML 3.2 or HTML 4.01 document to the modern world, you can use HTML 4.01 Transitional until you have managed to transfer all presentational issues to CSS (regarding certain techniques like rounded corners or image replacement techniques) and all behavioural issues to JavaScript.

then i also may have a point.

what are the beside obvious drawbacks of using an extended meaning for this “modern” transitional?