Article Database Design Scalability

Ok lets say I have a page and the body of that page is made up of blocks of text, lets call them sections. Each section is made up of a title and a body text, it may also have sub-sections containing also a title and body text. So we have something like this:

<page>

[INDENT]<section>
<section>

[INDENT]<sub-section>
<sub-section>
[/INDENT]<section>
<section>[/INDENT]
</page>

When it comes down to designing the database for this, it would be easier to have 3 tables, one for each level, the page, the section and the sub-section (the section/sub-section could probably easily function as one table).

Would there be scalability issues when storing the data this way?

Say the site developed 100 000 pages and each contained 10 sections then there’d be 1 000 000 rows in one table.
Would mysql significantly slow down searching a table that size?

Is it a better option to store all sections and sub-sections as a string in the page table then split it with php?

Thankyou.

unless you’re planning to re-use the sections and subsections, i.e. the same section on multiple pages, then there’s no reason why you’d need more than one table, containing one chunk of page text

on the other hand, things like nav bars and footers would be re-used, so you might want sections after all

me, i’d still put them all into one table, such that each section links to the page it belongs to, in a hierarchical structure

The page and section have different attributes so they can’t be stored in the hierarchal structure I think you’re suggesting.

A page would contain something like:
id
title
type
parent_id
path
img_url
body
–> this is where sections could be stored in a large string

where as section:
id
parent_page_id
parent_section_id
title
body

The sections are unique to each page, and wont be used anywhere else, so storing them in one of the page table columns as a large block of text might be the best way of doing things. It seems much easier now i think about it.

Thanks

Considering a section looks like a categorization for a page I would use a m:n approach rather than 1:m.

page

  • id

section

  • id
  • parent_id | fk: section(id)

page_section

  • page_id | fk: page(id)
  • section_id | fk: section(id)

Ah ok, I haven’t thought of it like that. That seems a simpler database structure. I would prefer to use a structure like this but it would take 3 queries to display read the sections, wouldn’t it? Do you think that would suffer from performance issues on a large scale? I would like to have the structure for this website to grow very large and suffer as little performance issues as possible.

Adjacency list a is proven method/pattern to handle hierarchical relationships. There are other techniques such as; nested set, closure table and path enumeration. However, given the initial question I would recommend sticking with the adjacency list considering it is the simplest one grasp. There are quit a few people that swear by nested set but it is much more complex and difficult to grasp. Especially true for someone who does not yet have a solid grasp of relational database fundamentals. Just stick with the adjacency list.

Sorry just so I can gain a proper understanding, I apologise if it’s at all frustrating to explain.
I’ve been using this as a reference: http://www.datanamic.com/support/lt-dez005-introduction-db-modeling.html

The relationship between the page & section are as follows:

page –> section: 1 page contains many sections – 1:n
section –> page: 1 section is contained within 1 page – 1:1

Based one what I understand from the method linked above, I’ve concluded that the relationship between page:section is 1:n

Just to be clear both page & section are recursive and use the adjacency list method.

Why do you suggest it should be an m:n relationship?

Does it have anything to do with both page & section being recursive?

Why shouldn’t the section & page_section tables you suggested be one table?

Thanks for your help.

why is page recursive?

i understand how you want a section to have subsections, but then why would a page have subpages? wouldn’t those subpages just be other sections?

No. Sub-pages would be a way of easily creating ‘categories’. With the data i’m dealing with, really the only difference between a page and category is how it’s displayed. The sections would be the content displayed within the page.

Say I had apage on architecture. This page on architecture would have lots of content broken up into sections. There would be an introduction section, a styles section, a materials section etc. Each section would consist mainly of a title, a body of text and maybe a few images.

Each section may have sub-sections. For example the styles section may have subsections georgian, roman or egyptian architecture.

Then say I come along and expand the georgian architecture section. It becomes really big, in fact so big that it should be its own page. So I separate it into it’s own page but it is still really a part of the architecture page, it should be under an architecture category. Making it a sub-page is a way of creating that category without knowledge it was needed before hand.

Data used in both a category and page are the same, just displayed differently depending on whether they have children. This is also convenient as the situation arises where it is better to display a page with children as a page and not a category. It’s a bit hard to explain, but it’s the best way to display my content and have that flexibility. I think.

Does that make sense?

yup

seems you’ve thought this through fairly well

i’m wondering why you posted this thread, now :slight_smile:

Haha me too, but I have learnt a lot as a result.

To be honest I think the question has been misinterpreted. I’m really new to using databases and am quite unfamiliar with its limitations.

Instead of having a sections table, I was considering having sections written as a large block of text in the pages table which would be in a coded format like html, or wikitext for example, that would then be later decoded for display by php. This would save creating a second connection to the database to retrieve the section table, i was wondering if it would cut down processing costs? I thought this may be a minimal performance issue that might increase as the tables grew.

I have been looking at wikimedia’s database schema though and they seem to be running many database queries without any issues. So I don’t think it will be a problem.

I just want to allow for this website to get large without any performance issues, and would rather deal with them now than many years down the track after it has all been implemented.