Need help understanding this query

adishardis · November 3, 2014, 9:44pm

I would appreciate if someone could explain this query that i found on this site (How do I return a group count for every 15 minutes of the hour on my Table?), the query is working perfectly but I really want to understand what’s going on in each step

$sql  = "SELECT 
	  d.x,
	  COUNT(DISTINCT t.email) as y
	FROM
	  (SELECT 
		min_date + INTERVAL number month AS x 
	  FROM
		(SELECT 
		  DATE('2012-10-01') AS min_date,
		  DATE('2014-11-04') AS max_date) AS m 
		CROSS JOIN tbl_numbers 
	  WHERE min_date + INTERVAL number month <= max_date) AS d 
	  LEFT OUTER JOIN `tbl_statistics` AS t 
		ON t.logdate BETWEEN d.x 
		AND d.x + INTERVAL 1 month 
             GROUP BY d.x"

Thanks in advance
/Adis

Andy_Blueman · November 4, 2014, 8:12am

Im no genius when it comes to this but most SQL queries are essentially the same, if you dont understand individual commands then google them.

SELECT (duh thats what we are always doing is selecting data from the db).
FROM (FROM) Well this is a little complex but basically its all FROM. The difference here is it appears to be using SELECTed information from the db to search the DB. and then also using a date selection as well.)
CROSS JOIN google it as im not sure on this but essentially I think its joining either 2 DB’s or at least tables and merging data? Not sure…
WHERE = Where data matches the following. More date specifics
LEFT OUTER JOIN … not sure on this. But from my little knowledge joins are getting data from another table. Again. Google JOINS and whatever DB this is im presuming MySQL?
GROUP BY as it says on the tin … Group the data that is returned by this.

Thats the best I can do for you , hopefully it will help if not maybe someone will give you a better answer thats not so noobish or wishy washy

[Edit]
Educating myself … http://www.sitepoint.com/understanding-sql-joins-mysql-database/ JOINS

r937 · November 4, 2014, 9:25am

okay, let’s start in the middle

(SELECT DATE('2012-10-01') AS min_date, DATE('2014-11-04') AS max_date) AS mthis produces a 1-row table called “m”, with 2 columns

now the next part is the cross join

CROSS JOIN tbl_numbers WHERE min_date + INTERVAL number month <= max_dateeach row of the “m” table (remember, there’s only one row) is joined with each row of the numbers table

now, the numbers table is special (for further information, please read the entire thread you linked to) because all it has is 1 column called “number” which contains the values 0, 1, 2, 3, … and this number is added to min_date, and the condition is, the min_date plus however many months added, has to be equal or less than end_date

so basically what we’ve just seen is the cross join creates a table, called “d”, of dates – ‘2012-10-01’, ‘2012-11-01’, ‘2012-12-01’, ‘2013-01-01’, and so on, right up to and including ‘2014-11-01’

the rest of the query should now make sense –

SELECT d.x, COUNT(DISTINCT t.email) as y FROM d LEFT OUTER JOIN `tbl_statistics` AS t ON t.logdate BETWEEN d.x AND d.x + INTERVAL 1 month GROUP BY d.xthe query obtains a count of the distinct emails for each month “x” in the date range

the whole business with the numbers table was intended to ensure that each month is represented in the results, even if there were no emails

molona · November 4, 2014, 9:36am

SELECT d.x, count(DISTICT t.email) as y

That means "show me the field name x in table d, and the count of the different values kept in field email in table t. When you show me that email count, give the column the name “y”

FROM (everything till the GROUP BY is part of this first FROM)
(SELECT
     min_date + INTERVAL number month as x
   FROM
      (SELECT ('2012-10-01') AS min_date, DATE('2014-11-04' AS max_date) as m
CROSS JOIN tbl_numbers
WHERE min_date+INTERVAL number month <= max_date) as d

This is a bit more complex because your table d is really a subquery… that includes another subquery in its FROM to create table named (or with alias of) d

So basically, your table d is created by creating two fields that do not really select values, these values are already give. These fields will have the names of min_date and max_date.

The whole subquery will have a name (alias) or m. It doesn’t need a FROM clause because I’m not really selecting any real field, just putting the values myself there.

  CROSS JOIN tbl_numbers 
    	  WHERE min_date + INTERVAL number month <= max_date) AS d 
    	  LEFT OUTER JOIN `tbl_statistics` AS t 
    		ON t.logdate BETWEEN d.x 
    		AND d.x + INTERVAL 1 month

Now, this ficticious table m will CROSS JOIN (or will do the cartesian product which is what CROSS JOIN means) with the table named tbl_numbers but will only be a valid output if when I add months to the min_date is still less or equal than the max_date.

numbers is just a field.

And all this whole thing is our table named as d

It is very typical to give aliases to tables (real or not). Some tables have really long names so giving it an alias of just one or two characters helps to write less.
In this case.

Now, our table d is joined with table tbl_statistics (alias as t, so when I refer to the name of the fields I don’t have to write the whole table name ;)) with a LEFT OUTER JOIN

That means that all the records of table d (the left part, because it comes first in the FROM) will be included in the result, but from table t (real name tbl_statistics), the only records included will be those that share the same values using the field t.logdate

Still, not all values will be shown because t.logdate value also has the condition of being between the value of the field x (coming from table t) and x + 1 month

GROUP BY means that the result (which will be only two fields, x and y, selected at the very beginning of the query) that for each different value of field x, show me the count number of emails (field y)

Not sure if this is clear

adishardis · November 4, 2014, 10:35am

Thank you so much, all of you!! I’m going to really go trough your answers this weekend to fully understand it but once again, THANK YOU!

Ps. maybe you guys could recommend any good book to delve further into this subject and maybe some book about big data analysis?

Cheers!!

molona · November 4, 2014, 11:33am

Rudy (r937) wrote a nice book about SQL. I think that since you’re starting at this, you may want to check it out and then jump into big data analysis

http://www.sitepoint.com/store/simply-sql/

adishardis · November 4, 2014, 2:05pm

Done and done! Going on holiday on tuesday so thats a perfect time to start reading up on the subject!

Take care and thanks again!
/Adam

system · February 3, 2015, 9:06pm

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.