7 Tips/techniques for multi-language sites

A recent article re-iterating over the things all web developers should be doing anyway but need to be especially aware of when having multi-lingual sites.

http://www.nomensa.com/blog/2010/7-tips-and-techniques-for-multi-lingual-website-accessibility/

tl,dr:
#1 set the language code on the html tag (though I’ll add, go ahead and be redundant and use the content-language meta tag as well: <meta http-equiv=“content-language” content=“en”>)
#2 set the language on inner tags if and when they differ from the main page’s set language (see note below about hreflang)
#3 Google sucks… see below
#4 Set the direction of the language (ltr is default though, no need to set that unless you need to do like #2, override the main page setting), and let the browser deal with it, instead of using CSS like floats etc to imitate proper direction! The W3C is clear on this. If you have a page written in Hebrew or Arabic, don’t leave the page set to ltr and then try to use CSS or illogical text order to make it seem rtl. Set the rtl in the HTML.
#5 Set a charset. Yes. Do it before the title or anything else in the <head> while you’re at it: browsers can and will re-start loading after determining the charset (though IE6 tries well to guess based on language heuristics or something weird). So get them to do that as soon as possible, NOT after trying to load your title, meta description, css, javascript, favicon and whatever else you’ve got in there.
#6 Set font size based on language (this one’s pretty cool actually… see David’s inline box model page and scroll down to see an example difference between baselines for characters, for example)
#7 watch out for differences in word-length (this one hits us every time, doesn’t it? And may make you re-think things like horizontal menus!) This is why blind language settings without actually checking can get you in trouble. As a related note, word-wrapping is also affected!


I think “accessibility” in the tite here is limiting it: this is also usability and frankly it’s correct HTML… with two? exceptions I’m not sure of:

2 Links:

I had to look around for hreflang attribute because I actually had never heard of it before. From what I’ve read, it’s only really safe to use if you know absolutely certain that the user, upon clicking that, WILL get that resource in that language. Meaning, it had better be hard-coded and not possible for the user’s browser to choose some default language they have set. So, not for links to wikipedia or google or anything external that you don’t control. Rather, put the language selection in the link text:
(either: “deze site in het Nederlands” with lang=“nl” or “this site in Dutch” without lang attribute on the link (since link text is in the same language as the page) and then the page linked to has html lang=“nl”).

For the rest, it should be on every dev’s checklist, whether they care about screen readers or not.

Also, I have issue with this:

In order to make language identification easier for Google, Google recommends only using one language per page.

I say, Tough. If my page content needs multiple languages, to hell with the googles. Use what makes sense with your content, because you ain’t writing for the googles… you are writing for human beings. And human beings understand the concept of multiple languages on a page. Remember that the lang attribute is supposed to be limited to within the tag scope of the tag you set it on.

Notice that Google will try to steer users to a particular language of a site, if available, not based on the user’s browser settings but based entirely on geo-location. This sucks butt when you’re traveling. You can turn this “feature” off, but only permanently after accepting a cookie, which doesn’t help as you go from hotel computer to internet cafe to wherever.

You probably won’t have done much with hreflang because it is mainly for the user-agent regarding the end of a link. So as an author you have to explicitly know what language was at the end. Typically you will have the language declared within most your own documents anyway.

Once you start pandering for the Search Engines over common sense then you should worry.