One Large Table vs. lots of smaller tables

Hi,

I’ve got a project where users will receive messages from each other. These messages have to be kept (forever).

Each user will probably receive 100 messages per month (some will receive 1000 or more).

There are 1000 users.

Should I create one massive table (with about 1.2 million rows / year) or should I create 1000 tables (1 for each user)?

Or should I create 1 table for each year?

The message only consists of 5 columns.

Also, I’d like to use cacheing but there will be “blasts” to each user. So that’s one reason why I’m thinking a single DB doesn’t work (because each time a row is inserted then I lose the cache). Some members won’t receive messages for days or weeks, and then they’ll get a ton all at once.

Also I’m using InnoDB engine.

I’d appreciate suggestions.

use one big table

properly indexed, there is no performance issue

what kind of caching did you have in mind? the whole table?

I don’t want to cache the whole table (it wouldn’t work anyway since there will be added rows all the time). I was thinking that smaller tables could be cached… I’m just starting the thinking on this table setup - so I’m not sure if I’ll cache this stuff. A lot of the other tables will get cached as they won’t have nearly as much movement.

I’ll probably just use a summary table for this one though and have update it every hour or so… It shouldn’t need to be more specific than that.

I haven’t used DBs with millions of rows before - so I wasn’t sure what the overhead would be. I didn’t want to have a nightmare after the first year or something.

I just discovered Triggers! :slight_smile:

I might just use those for keeping the summary tables up to date. I was thinking that I would have to count() that huge table – and that would take a bit of work (I can’t use myisam)… But I could just run it overnights to make sure everything is up to snuff and use triggers to keep real-time stats…

Edit: Good thing I’ve moved to MySQL 5 :slight_smile: That’s probably why I hadn’t noticed triggers before… I had briefly heard about them last summer, but I didn’t have time to think about it… bad move…

triggers for summary tables is a tried and true method

don’t be too hasty to look for performance solutions like that, though, until you actually need them

remember what they said about “premature optimization” …

Well I need to keep a count() on a table with million(s) of rows. As an innoDB it’s gonna have to count each row each time it gets counted. I can set it up to do it with every page refresh - but that’s going to be costly and slow! Or I could have a cronjob update a summary table every few minutes. Or I can have triggers update the summary table (when a row gets added it’ll just increment the summary table appropriately)… Then I can just have a cronjob recalculate the summary table every couple of hours. I think that this is probably the best way to keep the stats real-time and accurate. Each member can have access to their realtime stats without the overhead of recounting on page loads. Because rows are inserted into the table whether the user’s stats is updated or not, caching won’t really work because there will be inserts every couple of seconds.
I’m sure there’s a solution with memcache or something, but I don’t know anything about those.

Unless you have a smarter way to keep a table count in this situation?

i’m not convinced that you ~need~ to keep the count at all

:slight_smile:

Realtime Analytics…
For ease of discussion I said that this DB was for keeping track of messages between users… that’s not entirely correct…
It’s for keeping track of advertising campaigns through text messaging. clients get billed on a per message service, so it’s important that they know have access to realtime stats… some campaigns are time sensative (e.g. an on-stage event incorporating audience interaction via cell phones)

billing for advertising sounds to me like a great case for “last night’s data should be good enough”

Most the time that would be totally good. Except for time sensative stuff, like polls. They need realtime information for those (you can’t wait 20 minutes at talent show, ect, for results). Though perhaps I can have one table with time sensative information, and then leave the rest of the data in the other table.

Wouldn’t I still need summary tables? I’d just be updating them less frequently (and not using triggers). Or would I chache it? (the frontend is built on top of codeigniter which has some caching abilities… I think I can just set a cache with an expiry or something to that effect). I think it just stores the summary in a text file and retrieves those results rather than bothering the database…

But a trigger would add minimal overhead and allow for realtime. The user wouldn’t see lag at all

(as the user sends a text message, then it gets sent to our modem, and then it gets pushed to our api… even if it added a half second-maybe even a few seconds-no body would notice as the lags between sending and receiving texts between phones are sometimes already 10-15 seconds (or longer depending on carrier,ect) and always at least a full second).

Thoughts? Should I avoid the trigger, and should I use a different method than summary tables?

front end? cache? modem? api? one table and the other table? summary tables? triggers? codeigniter? text file?

looks like i don’t know nearly enough about your requirements to offer any more advice than i already have, sorry

that’s okay. thanks for your help though! I’ve done some reading around some of the things you’ve said… so hopefully before we plug away on parts of this project I can make some decisions! we aren’t working on this portion of the database yet :slight_smile: