Alt text and image captions

I’m working on a site with a number of images, one after another. Each image has a caption in <p> tags. My problem is what to do about the alt text, which would essentially be the same as the caption. If I include it, then screen reader users are going to have to listen to the same text, or very similar, twice over. If I put alt=“” on the image, my understanding is that screen readers will take this as an indication that the image has no meaning and will ignore it - but presumably it will then go on to read the captions, without any explanation that these are captions for images, which will sound very odd.

What’s the best way to deal with this? I suppose I could add in the alt text, then use a speech stylesheet to prevent the captions being read, which would be interesting, as I’ve never done anything like that before… (:

I’d be grateful for any help with this.

You need alt text that will tell those people that the following text is the caption of the image that they can’t see and provides what they need to know about it. Not exactly sure what the appropriate wording would be though.

Ah, now that’s an approach I hadn’t thought of, and it makes sense. But as you say, what would be the appropriate wording? I’ll need to ponder that. :slight_smile:

I was under the impression the limited support makes aural style sheets practically useless and aural style sheets are not required to be supported by the mainstream user agents. Then you have to consider there are those that have images disabled but don’t use a screen reader.

If you are just considering the screen reader you potentially could give it a null ALT since the context of the image is given within the P caption (with this I am assuming you envisaged an explicit tie-in and not a descriptive paragraph below). However, if the image is a link then you then must present something to the user on the image itself.

HTML5: techniques for providing useful text alternatives provides some advice.
Particularly the sections:

This article may also be of interest: HTML5 Accessibility Chops: using nested figure elements .

I’ve had another thought about this. If I give the section a heading of “Gallery of…” or even “20 images of…”, that tells everybody - screen readers and sighted users with images disabled - that the content is a series of images. If I then use alt=“” and the caption in <p> tags, it should be pretty clear what’s happening. I could also add a “skip images” link, to allow screen reader users to decide whether they want to listen to a series of captions or not. How does that sound?

@stevefaulkner: Thanks for the links, but HTML5 isn’t an option.

The problem with your proposed solution is that when you set alt=“” on an image it is no longer available to a screen reader user. They could not find it on the page unless they look at the source code.

The following examples are conforming html 4 and 5.

A suggested approach is to include a label in the alt attribute as in the example below:

<img src="clara.jpg" alt="Photo 1.">
<p>Photo 1: Clara in her bedroom, playing her 'electric' toy guitar.
She looks like a real 'Rock & Roll' girl.</p>

This provides an implicit association between the image and its caption (via the text “photo 1”). As the alt attribute is not null the presence of the image will be announced to screen reader users.

If you want to strengthen the association between the image and its caption there are a number of methods for example:

<p>
<img src="clara.jpg" alt="Photo 1.">
<span style="display:block">Clara in her bedroom, playing her 'electric' toy guitar.
She looks like a real 'Rock & Roll' girl.</span>
</p>

Wrapping the image and caption together in a p element provides a programmatic association missing in the first example.

I’m sure I’m missing something here, but why would a screen reader user want to find the image? I am very used to providing alt text to convey the meaning of the image, but in this one situation, all the images also have captions, and I am trying to avoid feeding screen readers redundant text.

That makes sense, but again it involves the reading of part of the text twice and I don’t understand how it helps to have each image announced. If I have a heading stating that this is a series of images, does that not convey enough information?

I may be missing something important about the way screen readers work, but I don’t understand the need to associate the text with a particular image. If the images were dotted throughout the page then yes, that would make perfect sense, but in this case, as I said, it is simply a series of images with captions.

I appreciate your suggestions, and I’m not dismissing them. I would just like to understand the issues a bit better.

I’m sure I’m missing something here, but why would a screen reader user want to find the image?

A number of reasons:

  • Not all screen reader users have no vision.
  • People with vision impairments find it useful have the role of visual objects announced.
  • Even if they have no vision it does not mean they do not want to save or copy an image for some purpose.

and I am trying to avoid feeding screen readers redundant text.

I was trying to provide you with some solutions given your limitations (no HTML5) which I also presumed meant no ARIA.

I don’t understand how it helps to have each image announced.

A caption captions something. hiding the presence of the object being captioned introduces ambiguity:

<p>A horse and a donkey.</p>
<p>Meeting the queen.</p>

Is the above a single caption or multiple captions? How would a user who was not aware of the presence of an images or images know?

Doesn’t that do it?

Depending on the browser sighted users with images disabled are aware of the images refer to: test result screenshots

Doesn’t that do it?

I am sure it won’t do it for a range of users.

Hi Steve, thanks for chipping in.

Since only really the validator would whine if she did use ARIA (meaning, unless she has a validation-nazi at her work who doesn’t understand the point of validation), she could use it. I would think she’s only avoiding unstable new elements like figcaption who need JS to exist for IE etc…
But, the proposal I’ve seen (with aria-describedby), I thought was something people are arguing against? Or was that only in the face of the removal of stuff like longdesc? Certainly that wasn’t the original point of aria-describedby…


One thing that just occurred to me was, if you had a friend who said “hey one of my dog pics ended up in this gallery” and you wanted to grab that image to mail to someone else, and you came to the page with the “Gallery of 20 pics” header and the captions, with empty alt attributes, would that mean you could find where the pic was (since the caption would have the relevant info) but you wouldn’t be able to set your reading-focus on the image in the browser in order to Save As… or anything?

If you (illegally) leave out the alt attribute entirely, most readers announce “graphic” (for English) or similar and then often the URL… if most of the big names are silent with alt=“” then I suppose using a word like “graphic” or the better “Image/Photo #” (with the number) in the alt isn’t going to be unusual to a user. You’d be supplanting a normally built-in function of announcing object type.

Without a linking-type label the user has to go by approximation (who is next to what) to know which text describes which image. Some pages put captions on top or to the left, others underneath, so if you’ve gone directly to your keyword (say with search in your browser) then you wouldn’t know right away where the image was that was related to the text you found. Which explains why Google often brings up the strangest images when I search for some things… the googlebot found a caption or bit of text nearby but turns out they aren’t related at all, just next to each other on the page.

Off Topic:

I b*tch and moan about HTML5 a lot but I really like the idea of having something “native” that explicitly links or associates captions with the things those captions are describing.

aria-describedby is designed and implemented to point to a text description (as against an accessible name/label for which aria-labelledby should be used) somewhere else on the page. as implemented it works best for interactive controls.

One thing that just occurred to me was, if you had a friend who said “hey one of my dog pics ended up in this gallery” and you wanted to grab that image to mail to someone else, and you came to the page with the “Gallery of 20 pics” header and the captions, with empty alt attributes, would that mean you could find where the pic was (since the caption would have the relevant info) but you wouldn’t be able to set your reading-focus on the image in the browser in order to Save As… or anything?

Agreed this is what I was trying to get across.

If you (illegally) leave out the alt attribute entirely, most readers announce “graphic” (for English) or similar and then often the URL… if most of the big names are silent with alt=“” then I suppose using a word like “graphic” or the better “Image/Photo #” (with the number) in the alt isn’t going to be unusual to a user. You’d be supplanting a normally built-in function of announcing object type.

Partially correct if you have an <img> without an alt in most cases the default behaviour of AT is to ignore the image, unless it is contained within an interactive element such as a link or button. This article I wrote a while back (2007) goes into more detail: [URL=“http://www.paciellogroup.com/resources/articles/altinhtml5.html”]Investigating the proposed alt attribute recommendations in HTML 5

Without a linking-type label the user has to go by approximation (who is next to what) to know which text describes which image. Some pages put captions on top or to the left, others underneath, so if you’ve gone directly to your keyword (say with search in your browser) then you wouldn’t know right away where the image was that was related to the text you found. Which explains why Google often brings up the strangest images when I search for some things… the googlebot found a caption or bit of text nearby but turns out they aren’t related at all, just next to each other on the page.

Yes, the second example I provided is better as it provides a programmatically determinable association; both image and caption are contained within the p element. But the promise and and [URL=“https://bugzilla.mozilla.org/show_bug.cgi?id=658272#c3”]partial realization of figure/figcaption is a dedicated mechanism for assigning captions to images and other content.

Off Topic:

I b*tch and moan about HTML5 a lot but I really like the idea of having something “native” that explicitly links or associates captions with the things those captions are describing.

I know you like to moan :slight_smile:

Yes, exactly. I did look at ARIA, but from what I’ve read, it’s also not yet widely supported. I’d be happy to use it in addition to other methods, if that’s likely to help, but I need some basic method that will work back to IE7.

OK, now I get the point. (I think I was being generally slow on the uptake yesterday. :blush:) Without the <img> being announced, there’s no way to know which comes first, the image or the caption, right?

Thank you - that’s helpful.

Do you know, I’ve often wondered about those odd results. This is turning out to be a day of enlightenment. :slight_smile:

+1. I found myself thinking rather longingly of figure and figcaption.

OK - that makes sense.

My images are all local landmarks or beauty spots and the captions will read something like “The 18th century church at *** is unusual because blah blah blah” or “The peninsula has several secluded beaches, including the white sands at such-and-such bay.” So rather than using “image 1” etc in the alt text, I could use “*** church”, “*** beach”, “*** lighthouse” etc., to strengthen the connection. Is that preferable?

And is it worth including a “Skip images” link? It seems to me that 20 image captions is a lot to sit through, especially if you’ve already heard them once. Do screen readers have a native mechanism to move on to the next section of text? There is no following heading to target.

Thanks for the help.

If you are wanting to view such a large gallery the odds are you will be prepared to sit through at least a handful of images else not want to view any of them. To me it sounds like an ALT would be fine since it’s looking like the caption or supportive will be more in-depth.

The XYZZY graveyard being populated by lichen covered stones going back to ancient times X Century and including… The ALT would tell you that you are vieing a “A grey gravestone in the shape of Celtic Cross”, etc.

Unfortunate that Mozilla’s the only one to still get this right. I mean, in the test, the missing image is pretty big, but when it’s smaller than your alt text, everyone seems happy to cut that text off (except, Mozilla). It’s actually one of my reasons for sticking with this bloat-browser.

So far I’ve only been using it in forms/form controls.
But it would probably then work best with your examples where there’s a link involved (like from the example 2.2 at w3.org)?

I’d consider it widely supported, at least some things. browser support test
It also depends on the reader. They’re expensive to update (the big commercial ones I mean) plus Window-Eyes seems perpetually behind everything new recently. But most of the last several versions of JAWS, NVDA and Orca have ARIA support. Not sure about that SAToGo thing, I’ve never touched that one.

So yeah I agree with you it should be a supplement today. But do use it!

What’s after the images? Just more text?

Screen readers have all sorts of ways to navigate, though Gonz (Chris Hofsteder) was telling me not too long ago how sadly normal it is that many screen reader users never seem to learn to use all the possibilities of their tool. Meaning, that there are a lot of people still using Tab and whatnot. That would drive me nuts, because you can move about in so many ways (best if the page is structured well of course). It’s kinda like we can do all sorts of things with browsers, like enlarging text, while the people who would love that most (like seniors) tend to be totally oblivious to these abilities.

Certainly screen readers would be kinda worthless if your only real option in most cases was to sit through and listen to everything linearly.

I mean, if I see a chunk of junk, especially if it’s a long bunch of links, I’ll N for “next Non-link text” for example. But I was lazy enough that I wanted to learn the quick-keys.
Orca locates “objects” and you can choose to just jump to the end of an “object” if you want. (A list of links can be treated as a single line, so one down-arrow and you’re out, for example)

You can also add aria landmark roles around the page too. Yeah, the HTML4/XHTML validator will whine at you. Whatever.

I’m personally conservative in my skip links, but depending on how your page is structured it might be a good idea.

Using keywords for your “alt labels” is probably also good, though do you eventually start getting church1 and church2? : ) But yeah I like that better than image1, 2 but I’ve also seen image galleries where the caption has a date in bold, then descriptive text in normal type.

<p><span>5 Dec 2011:</span> The kids were excitedly waiting for Sint and Piet</p>
where the date, if unique, would be great as an alt label.

I can haz shortcutz? : )

Shortcut yes that is what it is. >;-)

OK - I promise. :slight_smile:

Yes. There’s an introductory paragraph, then the <div> with the images, then several more paragraphs.

That’s all very helpful to know. I don’t know anybody who uses a screen reader, so I’ve no one to ask about its workings.

I can live with that. Looks like I should learn more about ARIA.

I like the date idea, but it wouldn’t apply here, as they’re not unique. I do have more than one church picture, but they’re different churches, so (e.g.) “Timbuktu church” and “Auchtermuchty church” would solve that.

That was really where I started, but then I discovered I was saying pretty much the same thing in each, as the text is fairly descriptive of the pictures. Perhaps I just need to get more creative. :rolleyes: Or perhaps I should write the alt text and ask my husband to write the captions. He’s an artist and sees things very differently to me. (:

Again, thanks to all for the help.

…and that’s where example 9.1 comes into play from Steve’s links. :slight_smile:

“The Round Church in Bowmore, built in 1767, was designed so that there were no corners for the devil to hide in.” If that’s my caption, I find it hard to see what I could add to the alt text to make it more descriptive. It’s a picture of a round church. :slight_smile: Sure, if my caption read “Fun on the beach at Blackpool”, then “Tom, Dick and Harry riding donkeys” would make perfect sense for the alt text, because there’s nothing in the caption to indicate what’s happening. The same applies to the linked example.

However, I have another question about screen readers and alt text.

I frequently have to use Gaelic names in both my body text and my alt text. In the body text, I wrap a <span lang=“gd”> around the offending item, but I can’t do that in alt text. Some names are OK, in that it’s possible to make an Anglicized pronunctiation, but others are not. What does a screen reader do if it comes across Bunnahabhain or Samhchair in supposedly-English alt text? I’m really just wondering if there’s some way to ensure that the user doesn’t think their screen reader is malfunctioning, as I suspect support for Scots Gaelic is not built in to most systems. :slight_smile: