Font-size:0 DOES NOT kill white space bug in Saf/Chr

system · November 1, 2010, 4:15pm

Stomme_poes:

Not really: watching my husband struggle with a German client who wanted their newpaper ads (which were archived as PDFs) to be searchable on the web was quite eye-opening in how easy HTML is. In the PDFs, like print, there are no words, no letters even: everything is a glyph, with a size, and a placement on the page with page locators. Your closest “context” is that there is another glyph of x-size whose coordinates may be “close” to the spot your glyph is. Hubby ended up writing a Perl Module (postscript::textDecode) which looked for the largest glyphs, then went through several pages of rules looking for things that should match… luckily, in the PDFs, the person’s name was the largest text but almost never the first text, so a lot could be eliminated by looking at glyph size.
Everything was coded with strange PostScript codes with Adobe proprietary glyph maps. You couldn’t just “read” it, nor could the program. A lot like if you had large, relief words on a wall and you were blind: you could feel the shapes of the letters but it’s not like you could just “read” them.

nice. skilful hubby.

but i would’ve done something simpler. pdf2txt. like this one. a batch pdf2txt utility. there are many. i’m sure a redirect to pdf after would be easy. and the whole process faster.

gary_turner · November 2, 2010, 5:35am

This may meander a bit, as I’m replying to your comments in the order made.

I meant exactly what I wrote; tenuous, as weak, flimsy, lacking substance or consistency. Certainly more sophomoric than sophisticate.

as for the rest, i’m sorry, but you are just grinding stone to hurt my ears.
Would that it would, but you’d have to actually listen.

you’re savy [sic] sounds like infatuation. and if you still believe that a bogus formating [sic] rule …
You keep running on about a formatting rule, bogus or not. What rule is that? The only formatting rules are those we or our employers impose; usually to improve clarity and thus productivity.

… that breakes [sic] the layout is stronger than a sensible solution like removing that formating [sic], you’re just trolling.
The problem isn’t that your solution doesn’t work, it does. The problem is that it depends on formatting which is highly susceptible to change. Since html is format agnostic, there is nothing to enforce that particular format, the maintainer is under no obligation not to naively screw it up while performing his job.

removing whitespace between elements has absolutely nothing to do with html. it doesn’t affect the html in any way. also, whitespace between elements has nothing to do with presentation.
Ah, but in this case, it does. More in a bit.

so, if UAs don’t know how to always handle your format, don’t resort to css. css is not there to corect [sic] your format, but to offer a presentation. that is, in this case you suggest to use css to remove presentation for good, not to alter it, or to make it show later or whatever. correcting formating [sic] is your job, not css’s. ponder about it tenuously griping [sic] the concept, please
The UA ignores the markup formatting. We are dealing with a white space within an inline string, i.e. CDATA. That, my friend, is a word space.

What do you mean, I should think about it with a flimsy gripe (on, with, about, ?) the idea? Sorry, makes no sense unless you meant I should think about it with a strong grip on the idea. But, that’s not what you wrote.

It has become, once again, abundantly clear that you have no understanding of the purpose or design of html. We are not dealing with a white space bug such as lived in IE<7. That did have somewhat to do with the white space, e.g. a line feed, between block level elements that should have been ignored, but wasn’t.

We have been talking about a correct rendering of white space between inline elements. The reason the problem exists is because we have used css to cause block elements to act as inline. Why would you expect a different behavior than that of natural inline elements or inline character data?

adding elements, that’s true manipulation of the markup.
What added element? The white space? That’s not an added element; that’s a part of the inline string. If you think otherwise, go back to the kiddie pool, you’re not ready to swim in the deep end.

That’s why it’s reasonable and safe to eliminate spacing between natural inline elements. Its effect and purpose is obvious. That is not true of the same hack for natural block elements, since they only act as inline due to the css display property. It follows that a css solution is appropriate to a css induced issue

yes, you do manipulate whitespace with css. but let’s be clear, that’s presentation whitespace, whether author’s or user’s. the whitespace we discuss it’s not that. it’s a rendering bug of the formating [sic] whitespace. a complet [sic] different type. forced in there by bogus formating [sic] rules.
Again with the non-existent formatting rules. How many times must you be told before you understand; html ignores formatting. We are talking about white space that is content, as it appears within an inline string.

HTML is a declarative structural language. That means it describes the structure of a document. No more, no less. It is up to the UA to use its default style rules, the user’s rules and/or the author’s rules to render the document. Your statement regarding Lynx indicates your total lack of knowledge of that browser. It too has its built in default display rules. I repeat from an earlier post, look in lynx.cfg; hell, modify it to suit your own preferences. Notepad, otoh, doesn’t grok html. It would make a crappy browser (it’s not much of a text editor either). It’s the structure, defined by html, that the UA uses to create the DOM. CSS and js then work against the DOM to do their things.

That the various graphic browsers all render stuff pretty much the same is the result of vendor cooperation or a desire not to be the odd looking one. The latter gets my vote. If you choose tags for their common presentation, you don’t understand the medium.

Aural presentation works the same as for visual. There are built in defaults, and aural style sheets. For example, I might write:

<p>Noonnope is a <span lang="fr">soi-disant</span> html expert.</p>

An aural UA would default to a French pronunciation for the spanned phrase.

Aargh, that’s enough

system · November 2, 2010, 5:52am

see, that’s why i said you are trolling.

we have been talking about formatting whitespace between <div>s. are <div>s inline? no. further more, we talk about the use of display: inline-block. will that make <div>s inline? no.

we were talking about a bogus formatting rule: use newline between the end tag and the start tag of two sibling <div>s. this false rule makes your presentation break. removing it has no ties to managing presentation in markup, since the whitespace we talk is not part of the content. or content whitespace. if you can’t understand why it’s bogus, i can’t make it much clearer than it has been made already.

you? you chimed in your self absorbed unrelated opinion, w/o making any sense. you, for your self importance, started juggling and mixing up with basic concepts. i really don’t know why you try and show me how to change agent style? is there a point? i don’t see it? is it a bright idea? why? why do you keep reiterating obvious presentation and aural concepts? is this your way of closing your eyes and shutting down your ears to hear only your ideas when you don’t get the gist of this thread?

as for the rest, i know enough about you as not to care about your opinion about me and i would not give you the satisfaction of resorting too to self sufficient language and approach.

one more thing you forgot

[I]"cheers,

gary"[/I]

gary_turner · November 2, 2010, 6:49am

we have been talking about formatting whitespace between <div>s. are <div>s inline? no. further more, we talk about the use of display: inline-block. will that make <div>s inline? no.
All elements and character data are inline unless a style rule says differently. %block and %inline et al are entity tokens that are useful for describing structural characteristics. Did you ever try the default stylesheet experiment I suggested? I thought not. Once the stylesheet sets a div to {display: inline-block;}, it is inline externally, and acts like any other inline replaced element.

we were talking about a bogus formatting rule: use newline between the end tag and the start tag of two sibling <div>s. this false rule makes your presentation break. removing it has no ties to managing presentation in markup, since the whitespace we talk is not part of the content. if you can’t understand why it’s bogus, i can’t make it much clearer than it has been made already.
Again, there is no such rule. It is common convention, and to depend on a particular formatting for a presentation fix is simply silly.

Moreover, once the div, li, or other elements are made inline of any sort, e.g. inline, inline-block, or inline-table, the white spaces between them become cdata, inter-word space.

My usual sign-off is meant as a friendly gesture. Note that again I don’t use it

system · November 2, 2010, 8:59am

on to the point.

Again, there is no such rule. It is common convention, and to depend on a particular formatting for a presentation fix is simply silly.

since there is no such rule, it’s simply silly NOT to simply fix a particular whitespace nodes problem generated by a particular formatting (no way a presentation problem! presentation is when you target html CONTENT elements and whitespace, this one is FORMATTING whitespace), by removing that no-such-rule formatting (or preserving a certain formatting, or fixing in a few simple steps a formatting that breaks your page), but to depend on a convoluted css fix (that doesn’t depend on whitespace declarations for a fix, if you were kind enough to read the latest css quiz), css solution that will be hard to maintain and to assimilate with a complex css code. by you or by your css solution recipient.

keep it simple … gary

SpikeZ · November 2, 2010, 10:00am

Lets keep this away from sarcastic poking and mudslinging. We have a pretty good discussion going on and I would like to keep it that way.

PaulOB · November 2, 2010, 10:06am

It’s ok to argue your points as strongly as you like but keep please keep the discussions civil otherwise it ruins an interesting thread.

Stomme_poes · November 2, 2010, 10:17am

since visually it has no importance, why would it aural?

We’re not talking aural. If you actually bothered using one you would know the difference. But you refuse. The tags tell the user WHAT the content is. The tags allow Navigation By Type. You can’t just cover your ears and say “lalala” as loud as possible and then claim that semantics don’t mean anything to human beings, because that’s at the least incredibly insulting to those who rely on AT to access information on the web.

Get a screen reader (there are free ones like NVDA for Windows, Orca for Gnome/Linux, VoiceOver for Mac…).
Go to a page using bad markup and try to navigate it.
Go to a page using good markup and try to navigate it.

(Aural stylesheets are usually ignored… I’m not talking about those (and I don’t bother writing them for that reason).)

offtopic

but i would’ve done something simpler. pdf2txt. like this one. a batch pdf2txt utility. there are many. i’m sure a redirect to pdf after would be easy. and the whole process faster.

How do you find the name of the person then? There’s nothing to work on.

The PDFs were messages sent to the newspaper with news of (birth, death, memorial, wedding, whatever). The styles on the PDF were mostly that the person listed was the largest text. You lose that if you convert PDF to text because all that meta-data is gone. So he needed to use that font-size and other info to “guess” what the person’s name was (he needed to extract the name).
However later he did find something that did try to keep some of the metadata called pdf2html. That might have worked, but it was not known or possibly not available at the time.

system · November 2, 2010, 10:48am

agreed. “The tags show the user WHAT the content is. The tags alow Navigation By Type (visually).”

i see no difference you are ignoring something

[ot]about the mood here. i’m new here at SP but i see the truth in this post:

[/ot]

Stomme_poes · November 2, 2010, 1:47pm

agreed. “The tags show the user WHAT the content is. The tags alow Navigation By Type (visually).”

Uh, if you’re blind, how do you navigate by tags visually?

Did you miss something?

Yes, sighted and partially sighted people can and do also use screen readers, but it’s still not why semantics are important on a page.

system · November 2, 2010, 4:08pm

you are really keen on confusion.

for sighted people: “The tags show the user WHAT the content is.”

for visually impaired people: “The tags tell the user WHAT the content is.”

the same thing. if tag Rendering != “visual appearance” then tag Rendering != “aural representation”.

simple. aural representation relies also on a default speaking style like the visual representation relies on a default visual style. the same rules apply for both visual and aural. you seem to think it doesn’t. for same reasons.

you’ve challenged the visual use of the tags but implied tags alone have an aural use. it’s a contradiction in the way you treat the same Navigation By Type: either visual, as in identifying the elements by scouting the page or aural.

you don’t seem to get one thing: there is no egg or chicken conundrum here. tags are natural descendants of the written system, Gutenberg and before. that means that <h1> is based on the visual representation of the biggest heading text found in a written document, or book.

saying that <h1> has semantics, but that semantics has nothing to do with the visual representation, is denying the history of human writings.

you, like gary, have a logic fault here. you seem to think that the way browsers modularize their system: keeping agent styles at large, in order to help you customize your experience, that this way changes the way written documents are perceived in the web world.

saying that all tags are text is like saying all java interface elements are objects. simply stating that you can alter the default style doesn’t change the fact that this is a feature not an innovation. when altering the default style you still need to consider it further when designing yours.

no, UAs are pieces of software that have no say in the way humans used or will use the written words. their internal rules, their construction and logic is only serving the technology that is the underlying factor.

for the end user the book or the ancient written artifact or the web page have the same system. and it does. you seem to look for wrong hidden meanings where there is none. web semantics, which i believe is what you’re trying to expose, were borned after and based on the existing “hard” written system and follow the old rules by every bit.

the problem here is the lack of vision. saying that html tags are used for semantics not for a visual meaning and that some style comes along later to do some rather unnecessary work otherwise, is nullifying the core html concept: text markup. if it’s about text, then it’s about a presentation of that text: visual or aural. having no presentation, default or otherwise, means html could be demoted to just t = text.

presentation and semantic have equal weight. if you get it wrong about semantic you may provide the wrong presentation: visual or aural. if you get it wrong about presentation, you may convey the wrong semantic: h3 bigger text than h1, calm tone describing violent threat.

how easy HTML is

keep some of the metadata called pdf2html

i believe xml is the correct answer to that problem.

xhtmlcoder · November 2, 2010, 5:58pm

[ot]The DTD is the specification that describes the semantics to be ascribed to the markup.

I think there has been some confusion with the language being used; since without the DTD its pretty much ‘text’ within angled brackets.

The second concept is regarding the user-agent having a “default style sheet”, which typically includes the presentation.[/ot]

system · November 2, 2010, 6:16pm

actually, it’s pretty simple.

[B]html was born to put text from paper to screen and to link it. the text should look like the one on paper: headings, paragraphs, lists, figures.

hence, putting html tags around your text means presentation. at first, it was only visual. nowadays, it’s also about aural.[/B]

the semantics, Navigate By Type, DTDs, stylesheet, inline, character data, interface etcetera are details of the technology used to make it possible.

these technological details are not changing in any way the initial facts:

[B]html was born to put text from paper to screen and to link it. the text should look like the one on paper: headings, paragraphs, lists, figures.

hence, putting html tags around your text means presentation. at first, it was only visual. nowadays, it’s also about aural.[/B]

it’s really that simple. don’t get lost in the details

don’t mistakenly take implementation details, modularization (default style or such) as new ways to read into this. you are still addressing a human. you are still following the same old rules.

semantics and presentation go hand in hand. html tags were created looking at the presentational features of the different text parts. you use those still.

using html you have however presentation flexibility at low cost (lower than the hard print) and superior spreading speed.

and this may confuse some people.

and finally: looking at the two solutions for our problem. changing formatting vs. changing css.

changing formatting has no ties to the html, css, js, flash, w3c, microsoft, rednecks, Athena. it’s all about a choice in slapping around you code bits in a file. it has no ties to the CONTENT or the CONTENT WHITESPACE.

changing css it’s the first mistake. you’re writing code to fix first rather than to present.

re-formatting may break your intended page result. it’s an easy fix. easy to understand, easy to apply.

re-writing/adding to css may break your intended page result. it’s not an easy fix. it’s not easy to understand, not easy to conserve and and not easy to blend with simple or complex css. also, it may have unexpected results, as it has not been tested extensively. future mods in css may prove it has unwanted results. this means further research, further errors, more time losing for debug.

Stomme_poes · November 2, 2010, 8:14pm

the same thing. if tag Rendering != “visual appearance” then tag Rendering != “aural representation”.

Ah, I finally understand what you mean. However isn’t that the point of semantic tags? To tell the user /user-agent what the content in the tag represents?

I’m saying it matters WHAT tag you wrap around some content, regardless of how one styles it visually (this includes whatever styles are left to the browser).

you’ve challenged the visual use of the tags but implied tags alone have an aural use. it’s a contradiction in the way you treat the same Navigation By Type: either visual, as in identifying the elements by scouting the page or aural

Uh, NO. I don’t rely on CSS to tell someone what the h1 of a page is. I rely on the <h1> tags to tell users that. No contradiction, however yeah I’d say most sighted users with a graphical browser and CSS enabled don’t notice semantic markup at all… they don’t know the difference between <h1>BIG HEADING</h1> and <p><strong><font size=20px>BIG HEADING</font></strong></p>. But there are users who DO rely on semantic tags (tags which correctly describe the content), and screen reader users are some of them.

So yeah, sure, if I was a lazy developer and only wrote for sighted users with graphical browsers and CSS, I wouldn’t have to worry if the main header text was in an h1 or a p. I don’t believe you are arguing that <p> here is a good thing to use but it sure sounds like it from your arguments.

that means that <h1> is based on the visual representation of the biggest heading text found in a written document, or book.

You are saying that because traditionally the heading text of [some document] was large and bold, that this is why in HTML documents we wrap <h1> tags around the header text? Yes. Not because it is large and bold, but because it IS header text. That lots of people (and browsers) choose to make it look large and bold has nothing to do with the reason why TBL (or whoever it was) decided that heading text should be wrapped with heading-text tags.

<h1> does not mean “first large and bold”. It means “heading text, first level”. Regardless if one of my browsers shows it as small and green (which is visual representation).

Here’s the “aural” you keep talking about but don’t listen to:

on page load
“Page has five headers, two tables and fifteen links.” (you wouldn’t get this semantic information if you built everything out of p’s)
I hit “h”, the quick key that takes me to the first heading tag of the page.
“Header level one.”
I hit “h” again.
“Header level two.” Oh, this must be a subheading of the main header I was just on. It must be related information.

This is not “aural styling”. It’s semantics. It explains what the information inside IS.

system · November 2, 2010, 8:56pm

NO, IT WASN’T!

HTML was invented to be able to present content in a device independent manner – be it print, screen, teletype, aural, whatever.

Given the plethora of device dot pitches, device resolutions, etc that were present when TBL took SGML and did something useful with it, your statement is pure nonsense. From 22x21 9 color plaintext on a VIC-20 to 64x16 monochrome on a TRASH-80 to 80x25 16 color DOS/CPM… to 320x200 4 color CGA to 1152x864 on a monochrome NeXT workstation (like the one TBL was using when he made HTML) to 800x600 16 color IBM 8514. Even from daisy wheel to 9pin line-printer to 1200dpi typesetting - The idea was to make it so none of that mattered and have the content fluidly adjust on the fly to whatever the target was. Identical appearance across devices was impractical at best, impossible at worst. As such the point was NOT to make it appear “the same on screen as on paper” – but to custom craft it to appear different but acceptable on both.

The entire POINT of HTML was to say WHAT things were, and let the user agent best determine how to show that meaning. This even includes the B and I tags when it comes to devices that don’t have the equivalents to font-weight or font-style. (which is why B and I are in fact semantic tags that mean bold and italic but NOT font-weight:bold or font-style:italic!) It wasn’t until the browser wars with proprietary tags like FONT (which was later accepted into 3.2 but started out proprietary) that the mere notion of presentational markup came into being.

That you apparently don’t grasp that is likely why you’re the only one defending your viewpoint on this here… You do not seem to grasp the point of semantics OR separation of presentation from content.

Hell, TBL’s original prototype browser even had something much akin to stylesheets – though his would be best compared to user.css today or the stylesheets used in some browsers like firefox, to customize the browser’s behavior for each device it could be run on since the default appearance for one media type might have absolutely nothing to do with the default appearance on another – even if they are both being used to convey the same meaning.

See the old daisy wheel styling of underscores before and after to indicate bold and ~tilde~ before and after to indicate italic… Today that would be done with :before and :after on devices that don’t support font-weight or font-style. (and some teletype devices actually support this!)

system · November 2, 2010, 9:02pm

Reading a web page.
Looking at a headline = I hit “h”, the quick key that takes me to the first heading tag of the page.
“Header level one.”

Scouting out the next headline = I hit “h” again.
“Header level two.” Oh, this must be a subheading of the main header I was just on. It must be related information.

This is not “aural styling”. It’s not semantics. It’s visual style explaining what the information inside IS.

i believe this is my queue to get off this train. i’m disappointed, i really am

xhtmlcoder · November 2, 2010, 9:05pm

Not really presentational but structural markup. The presentation is the user-agent stylesheet (which includes differing media).

The idea was to be able define an interoperable document using a common dominator. Markup should describe a document’s structure and other attributes, rather than specify the processing to be performed on it.

system · November 2, 2010, 9:20pm

Or on source order, or on something other than ‘big words’ since H1 does not mean ‘larger text’… it means the parent heading under which all other headings are the start of subsections. That’s ALL it means.

That means you might do

h1+* { padding-left:2em; }

Like some lynx stylesheets do when color and bold are unavailable. It’s still conveying the meaning.

Or to go back to the newspaper analogy I use all the time. What heading appears on every page (or every other fold) under which all other headings would be subsections? The name of the paper of course. It might appear larger on the front page than the sub-page, but it’s still the topmost heading.

On the front page, you might have

MAYOR CAUGHT TAKING MILLION DOLLAR BRIBE

In giant text… does that make:

K-6 GETS NEW BUILDING

and

OFFICER INJURED AT LOCAL DINER

… subsections of that first article? Of course not. They are all H2 REGARDLESS of what size they are being presented in on the print copy. This is where people screw up the word ‘importance’ as it doesn’t mean ‘more important’ in terms of the content, but in terms of the STRUCTURE – and structurally those are all kin to each-other not subsections of the first! Unless the mayor used the bribe to buy that school and then beat up a cop at the diner…

H2 is the start of a subsection to the h1. If your element that you are putting a heading on is NOT a subsection of the higher order heading preceeding it, you have the wrong heading order!

Presentation and default appearance doesn’t even play into it!

PaulOB · November 2, 2010, 9:24pm

How is that?

Wasn’t it CSS that introduced the white space bug that this thread is all about? I’m sure the problem occurred by changing an element to display:inline-block with CSS.

Before the element was changed with CSS there was no white space to contend with therefore it should be CSS’s job to fix it. There was no white space bug until the CSS was applied in the first place. Or am I mistaken here?

Regardless of the rights and wrongs there are two solutions as I see it (and these are my personal observations relating to my experience).

Method 1) Adjust the html mark up to reduce the gaps between tags.

Pros:
1) Relatively easy to do
2)Doesn’t need css
(Can’t think of any other valid pro reason)

Cons

Dangerously fragile and liable to break as soon as someone edits the code. In my experience 99% of my clients would break this when they add their content or when their developers convert the page to php or add dynamic data, or when they convert the template into a CMS.

I just cannot ensure that everyone in the chain will not break the formatting of the html. The chain may stretch from here to India and back and I have no control over that. (The css would likely remain untouched and intact through all these stages.)

Awkward to work with and difficult to read and administer. Working with html in one long line is virtually impossible. Having to format the code for editing and testing and then reformat for viewing would be a nightmare during the development stage.

Method 2) Adjust the CSS.

[B]Pros

[/B]Relatively easy to do
[B]
[/B]Keeps control of presentation in the stylesheet where it belongs.

3) Won’t break when someone reformats the html.

4) The chances of the client breaking the relevant css is very unlikely. In fact in a lot of the templates I provide my existing CSS is not even touched. They may add css but they seldom change what I have set up in the first place. If the argument is that the css can accidentally be changed then that could happen to any of the CSS anyway so is really a red herring otherwise we’d have to go back to tables and spacer gifs.

The html can remain nicely formatted and easy to work with.

[B]Cons:

[/B]Client may break the css (None other that I can think of)

Conclusion.

a) If the html is reformatted for presentational effect then said presentation is lost if the html is edited or reformatted. The chances of this happening - very likely.

b) If the CSS method is used then it doesn’t matter how the html is formatted but the CSS could still be unwittingly broken. The chances of this happening - very unlikely.

In all honesty and likelihood the odds that the html would break first must be very high. The css version could break but this is very unlikely to happen.

Discussion with client goes like this:

ME: “Would you like a page that breaks as soon as someone reformats the html”. You can avoid this by contacting your programmers in India and teaching them all to ignore well established facts that html reformatting should not change the appearance. You can then contact everyone else in the chain and advise them of the same details. It should be quite easy to do and shouldn’t take you long to get them all up to speed.

Or would you like a robust page that will not break no matter how the html is formatted?

Client: “Give me the page that breaks as soon as the html is edited and then I can sue you for all you’ve got.”

Joking aside, if you are asking me to give my client a fragile page that will break as soon as he looks it or a robust page that he would have to try very hard to break then I know which I would choose every time. Common sense is the issue here as the html method has more chance of failure.

But hey “each to their own”; you do it your way and I’ll do it mine. We’ll just have to agree to differ.

system · November 2, 2010, 9:31pm

i’m sorry, but how does your phrase goes against mine. i see that they say the same thing: present electronic content. did i mention any device dependent manner?

[ot]

since gary, i see you’re reviving you’re old tune, my friend that’s fine, enjoy you’re bashing.[/ot]