MySQL View: recommended algorithm?

Teckinno · April 9, 2008, 5:30pm

Hi all,

I was wondering if you could explain a little more what is the best choice for a view algorithm. Reading on Google and MySQL Dev site did not helped me to clear my concerns, so i was hoping one expert could enlighten me.

The scenario is simple:
I create a view that pulls data from 5 columns, some of them being JOINed from other tables:

SELECT thread.threadid, thread.threadtitle, post.postid, forum.forumtitle, user.username
FROM thread
LEFT JOIN post ON (thread.threadid = post.threadid)
LEFT JOIN user ON (thread.userid = user.userid)
LEFT JOIN forum ON (thread.threadid = forum.threadid)

Then I run a simple view:

SELECT * FROM view WHERE dateline < 84600

I have indexes on threadid, postid and forumid columns.
What algorithm you would use for the above case? UNDEFINED, MERGE or TEMPTABLE? Could you you please explain why also? I currently use TEMPTABLE, because the view is used into a cronjob that is performed every 10min. The idea is to release the table locks as soon as possible, so TEMPTABLE (from my understanding) will create a temporary table with the view data, then release then locks and push the SQL data. Do you think using TEMPTABLE could affect the server load? Is it better to simply use UNDEFINED?

Thanks for your explanations.

longneck · April 9, 2008, 7:14pm

generally, if any table involved in a view gets frequent updates then the view should be MERGE and not TEMPTABLE. the amount of time saved by releasing locks early with TEMPTABLE is probably smaller than the amount of time it takes to create the temporary table.

this can be mitigated by a two layer view if, for example, only posts and users are updated frequently but thread and forum are not:

create algorithm = temptable
  view _view_threads_inside
select ...
  from thread
left outer
  join forum
    on ...;

create algorithm merge
  view view_threads
select ...
  from _view_threads_inside
left outer
  join user
    on ...
left outer
  join post
    on ...;

Teckinno · April 9, 2008, 7:38pm

Thanks for the nice explanation. longneck.
The problem I need to deal is this:
Users post actively on a very large board. The data is stored first into post table (forum table gets updated with the latestpost column also), then thread table gets updated and finally… user table have a minimal change (statistics). This is independent of my view, it is handled by the default software. However, I want to pull (on the side, for personal usage) some of the data, from the above mentioned tables. We are talking about one user doing a view query every 10 minutes.

Do I still follow your advice (multiple layers)? Thanks for your patience.

r937 · April 9, 2008, 7:52pm

just run the query, never mind the views, temp tables, merges, heaps, stacks, piles, or whatever…

Teckinno · April 9, 2008, 7:59pm

Aha, I thought a view is more performant, compared to a direct (large) query?
The idea is to also learn how to use properly the algorithms.

longneck · April 9, 2008, 10:58pm

in your case, a view will not improve performance. performance is just one small advantage of views.

the biggest win of views is abstraction. you can hide the implementation of your daa behind a single statement. this helps guard against dependency on specific table structures; if the tables need to change, then you only need to update the views that reference those tables and not all your code. you can also secure your data on a row-by-row basis at the server by including current_user() in a where clause somewhere.

Teckinno · April 10, 2008, 8:10pm

Thanks for the clarification.