COUNT() optimisation, moderately large dataset

EquinoxZA · June 30, 2010, 6:22am

Hi folks

A website I am working with has grown fairly large and we’re starting to take a hit from my (admittedly) fairly novice SQL queries.

I have an hourly cron job which calculates the number of views which an article has received. I have over 10000 articles and for compatibility with my predecessor’s code, we have a 10 million row “views” table.

We’re trying to tweak this query:

UPDATE `articles` a 
SET a.views = a.views + ( 
SELECT COUNT( apv.id ) FROM `article_pages` ap 
LEFT JOIN `article_page_views` apv ON page_id = ap.id 
WHERE ap.article_id = a.id AND apv.created &gt; 1277878382
)

I would like it to only update articles for which there are actually new page views. I am doing this the wrong way round though, because it updates every article in the database (whether they have a new view or not) – just not sure how to fix it!

Thanks in advance for your assistance!

EquinoxZA · June 30, 2010, 7:33am

You’re a star! I’ll take a look at the manual to try and figure out why that works, but that’ll get me going. Thank-you.

r937 · June 30, 2010, 6:58am

try this –

UPDATE articles AS a 
INNER
  JOIN article_pages AS ap
    ON ap.article_id = a.id
INNER
  JOIN ( SELECT page_id
              , COUNT(*) AS views
           FROM article_page_views 
          WHERE created > 1277878382
         GROUP
             BY page_id ) AS apv
    ON apv.page_id = ap.id 
   SET a.views = a.views + apv.views