I made a start setting up a DB structure for multiple language websites, but I have the feeling that I miss something and I can’t put my finger on what it is. This are the tables I have so far:
CREATE TABLE IF NOT EXISTS `languages` (
`language_id` TINYINT(2) NOT NULL AUTO_INCREMENT,
`language` VARCHAR(12) DEFAULT NULL,
`country_flag` CHAR(8) DEFAULT NULL,
`isActive` TINYINT(1) NOT NULL DEFAULT '0',
PRIMARY KEY (`language_id`),
UNIQUE KEY `language` (`language`)
) ENGINE=INNODB DEFAULT CHARSET=utf8;
CREATE TABLE IF NOT EXISTS `pages` (
`page_id` TINYINT(2) NOT NULL AUTO_INCREMENT,
`page_address` VARCHAR(128) DEFAULT NULL,
PRIMARY KEY (`page_id`)
) ENGINE=INNODB DEFAULT CHARSET=utf8;
CREATE TABLE IF NOT EXISTS `content` (
`content_id` TINYINT(2) NOT NULL AUTO_INCREMENT,
`page_id` TINYINT(2) NOT NULL DEFAULT '0',
`title` VARCHAR(40) DEFAULT NULL,
`heading` VARCHAR(255) DEFAULT NULL,
`content` TEXT,
PRIMARY KEY (`content_id`),
KEY `page_id` (`page_id`)
) ENGINE=INNODB DEFAULT CHARSET=utf8 ;
Those tables are self explaining I guess. table languages holds the different languages with the requested number of languages set to active. table pages are the actual pages in the website: about_us.html etc. table content holds the actual content for the different pages. And at this point I have a block! I think I need at least 1 if not more relational tables. Or should I add another field language_id to the content table? Can anyone tell me If I miss something here or that I’m on the right track
Is the language relative to the page or the content?
I would suggest that it relates to the content and so language should be a 1:1 relationship between each content record in the db. that means a ‘title’ and ‘content’ written in French, will show in French whereas a ‘title’ and ‘content’ written in Greek, will show in Greek. 1:1
in my CMS, I have these table structures
languages:
| language_abbr | language_full | flag |
| ENG (pk) | English | eng.gif |
| GRE (pk) | Greek | gre.gif |
file_sequencing
| file_id | file_name_home_language | sequence_number |
*sequence_number determines the order they display in the nav menu
** home language is the home language of the business/user not the web viewer
file_data
| id | file_id | language_abbr | title | heading | content |
by the way, it’s one-to-one only as seen from the content side – from the language side, it’s many-to-one, as there can (hopefully) be more than one content for each language
i would use the international 2-char language code for the language PK, but the 3-char one is okay too, i guess
Good explanation indeed IBazz. :tup: This helps me a lot indeed. One question though! Is the a particular reason why you use the language_abbr as PK instead of a number?
@Rudy: can you explain what you mean with this:
i would use the international 2-char language code for the language PK, but the 3-char one is okay too, i guess
The auto_increment was indeed the thing that made me confused I guess
Is the a particular reason why you use the language_abbr as PK instead of a number?
in simple terms it’s because its unique to a language. (OK, so there is a variety of English such as US English and International english but, fundamentally, its all English). and if there is a natural PK in your table content, there is little benefit in using a surrogate key.
However, on a technical point, I am not sure if (in other circumstances), a huge PK value would be as efficient in terms of bytes, as an auto_increment number. Personally, I value readability quite highly when moving from table to table as part of the development process.
ISO 3166 are country codes, not language codes…
ISO 639 are language codes
BUT
Having built multiple multi-lingual databases, I would suggest you use BOTH… and establish a LOCALE table instead of a LANGUAGE table.
Locale is the combination of language and country, e.g. en_US, or en_GB. It gives you more ability to localize your application not only on language, but currency, number formatting, metric-vs-kings measurements, etc.
I typically structure my ML DB as such:
-- Fill this with all ISO-639 languages
CREATE TABLE language (
id CHAR(2) NOT NULL PRIMARY KEY,
name VARCHAR(255) NOT NULL
) TYPE=InnoDB;
-- Fill this with all ISO-3166 Countries
CREATE TABLE country (
id CHAR(2) NOT NULL PRIMARY KEY,
name VARCHAR(255) NOT NULL
) TYPE=InnoDB;
CREATE TABLE locale (
language_id CHAR(2) NOT NULL REFERENCES language(id),
country_id CHAR(2) NULL REFERENCES country(id),
name VARCHAR(255),
active TINYINT NULL
PRIMARY KEY (language_id, country_id)
) TYPE=InnoDB;
You can either prefill your entire locale table and use an active/inactive flag (as shown) or only fill it with locales you support.
Country should be nullable so you can create a default locale with no country (e.g. “en” or “es”). Typically you should have your application use a particular country’s locale act as the default. This can be achieved by adding a “default” flag to your country table and making “country_id” not null… then any non-country-specific locale or unsupported country’s locale would fall back to the default country’s locale.
-- Fill this with all ISO-639-1 languages
CREATE TABLE `languages` (
`language_id` CHAR(2) NOT NULL PRIMARY KEY,
`name` VARCHAR(255) NOT NULL
) ENGINE=INNODB DEFAULT CHARSET=utf8;
-- Fill this with all ISO-3166 Countries
CREATE TABLE `countries` (
`country_id` CHAR(2) NOT NULL PRIMARY KEY,
`name` VARCHAR(255) NOT NULL,
`default` TINYINT NOT NULL DEFAULT 0
) ENGINE=INNODB DEFAULT CHARSET=utf8;
-- Fill this table with supported locales
CREATE TABLE `locales` (
`locale_id` VARCHAR(5) NOT NULL PRIMARY KEY, -- This would be like 'en_US', etc.
`language_id` CHAR(2) NOT NULL REFERENCES `languages`(`language_id`),
`country_id` CHAR(2) NULL REFERENCES `countries`(`country_id`),
`name` VARCHAR(255),
`active` TINYINT NOT NULL DEFAULT 1
PRIMARY KEY (`language_id`, `country_id`)
) ENGINE=INNODB DEFAULT CHARSET=utf8;
CREATE TABLE IF NOT EXISTS `pages` (
`page_id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
`page_address` VARCHAR(128) DEFAULT NULL,
PRIMARY KEY (`page_id`)
) ENGINE=INNODB DEFAULT CHARSET=utf8;
CREATE TABLE IF NOT EXISTS `page_contents` (
`page_id` INT UNSIGNED NOT NULL REFERENCES pages(page_id),
`locale_id` VARCHAR UNSIGNED NOT NULL REFERENCES locales(locale_id),
`title` VARCHAR(40) DEFAULT NULL,
`heading` VARCHAR(255) DEFAULT NULL,
`content` TEXT,
PRIMARY KEY (`page_id`, `locale_id`)
) ENGINE=INNODB DEFAULT CHARSET=utf8 ;
Yeah, i would agree with that. On the application side of things two or three alphabetic characters isn’t going to be an issue passing via the URL if its even needed.
transio, that new design doesn’t support translations. How would you determine the available translations for an article that originated in English?
That design most certainly DOES support translations. Is the originating language important?
the page_contents table contains the contents of all pages in multiple languages. The page_id and locale_id combination define which page and what language the contents are in.
Well, I can’t profess on the issue but, I wonder if locales is a bit ‘PC gone too far’. Surely its about communication? so if the different dialects don’t cause a breakage in communication then the does the adage ‘if it ain’t broke, don’t fix it’, apply?
Maybe it’s useful for multi-national entities but the cost of translation is quite significant for smaller businesses. I digress :oops:
Locale implies more than just language, IBazz… For example, en_US, en_GB, en_CA, en_ZA, en_AU, en_NZ - all imply “English”, and can fall back on the default “English” dialect you choose for your site… but which ones use mm/dd/yyyy vs. dd/mm/yyyy? Which use meters and kilos vs. feet and pounds? These are important “locale” settings that have nothing to do with language. If your mission is to localize your website, why restrict your DB to language when it’s not much more work to account for full locale, and allow future localization of other non-linguistic information?
Don’t think that’s important? If you print your date for an upcoming event as 04/05/2011 - half your audience may believe it’s April 5th, the other half May 4th… think that’s not an important miscommunication? Think again.
Heh… “equally confusing EVERYONE” doesn’t equate to “no confusion anywhere”. I’ve used dates in YYYY-MM-DD format in websites before… the specific case I’m thinking of was a form with a jQuery datepicker that filled the field with a Y-m-d date. I never saw so much negative user feedback in my life!!! People called us at least every other day confused about the dates!!! Don’t assume people are smart… that’s a sure way to develop poor usability apps.