Outer join query

I’ve developed the following query to show the views and purchases of the products we sell. However, we just implemented the view tracking table so any dates prior to this week don’t have any view data at all. This query works fine as long as there are page views for the products during that period. However, since the older data doesn’t have views data, no data at all is being returned for older queries. How do I return NULL for page views when there are none during that period but there are sales? I assume the problem is due to the ON clause in the left outer join, but I’m not sure how else to do it.

SELECT 
   SUM( op.products_quantity ) AS num_sold, 
   SUM( op.final_price * op.products_quantity ) AS total_sales, 
   p.products_model, pd.products_name, 
   p.products_id,
   visits.visits
FROM orders_products AS op
   JOIN products AS p
   JOIN products_description AS pd
   JOIN orders as o
   LEFT OUTER JOIN (
      SELECT pv.products_id, count( pv.ip_addr ) as visits
         FROM products_visits AS pv
         WHERE pv.timestamp
         BETWEEN '".$start_date."'
         AND '".$end_date."'
         GROUP BY products_id
   ) visits ON visits.products_id=p.products_id
   WHERE p.products_id = op.products_id
   AND p.products_id = pd.products_id
   AND op.orders_id = o.orders_id
   AND visits.products_id = p.products_id
   AND o.date_purchased BETWEEN '".$start_date."' AND '".$end_date."'
   GROUP BY p.products_id
   ORDER BY total_sales DESC

nope :slight_smile:

the problem is that you have a condition in the WHERE clause which “negates” the left outer join

your ON clause says

ON visits.products_id=p.products_id

if this were all you did, it would be fine, and you would return rows from the p table which had no matching rows in the visits table, and the columns from the visits table would be NULL in those unmatched rows

however, you’ve sabotaged your outer join with this –


   WHERE ...
   AND visits.products_id = p.products_id

this ~forces~ the visits rows to have a non-NULL value, i.e. to match the p table

in effect, this changes the outer join to an inner join, because trhe unmatched rows are filtered out by your WHERE clause

make sense?

:slight_smile:

Thanks! Yes, I forgot to take the where clause out when I was playing with the query.

you need to put the other join conditions into their respective ON clauses as well

here’s your re-written query…

SELECT SUM( op.products_quantity ) AS num_sold
     , SUM( op.final_price * op.products_quantity ) AS total_sales
     , p.products_model
     , pd.products_name
     , p.products_id
     , visits.visits
  FROM orders as o
INNER
  JOIN orders_products AS op
    ON op.orders_id = o.orders_id
INNER
  JOIN products AS p
    ON p.products_id = op.products_id
INNER
  JOIN products_description AS pd
    ON pd.products_id = p.products_id
LEFT OUTER 
  JOIN ( SELECT products_id
              , COUNT(*) AS visits
           FROM products_visits
          WHERE timestamp BETWEEN '".$start_date."'
                              AND '".$end_date."'
         GROUP 
             BY products_id ) AS visits 
    ON visits.products_id = p.products_id
 WHERE o.date_purchased BETWEEN '".$start_date."' 
                            AND '".$end_date."'
GROUP 
    BY p.products_id
ORDER 
    BY total_sales DESC

note carefully the sequence of tables in the FROM clause, and notice that the only WHERE condition applies to the first table mentioned in the FROM clause – in effect, this is what “drives” the query, i.e. this is the most restricted table so i always write it first, so that the other tables are joined only to the rows you’re interested in

the way you had it, with unqualified joins missing their ON conditions, you’d get a humoungous cross join, after which the various conditions in your WHERE clause would start throwing most joined rows away

make sense?

Yes, that makes a lot of sense. Thanks for your help.