Question from the book Simply SQL

saleemhasan · April 16, 2015, 7:08am

I am new to SQL and started with Simply SQL by Rudy Limeback. I really like the book. I am running a mysql server on a windows machine. My question utilizes the entries table from the content management system example. Actually, the question is much more basic. When we join two tables with a common field (primary key, foreign key), the resulting set (table) contains complete records for both tables that meet the selected criterion. However, when we operate on a single table, the resulting set is not always made up of entire records. For example, in the entries table if I run a query select id, category, title, created from entries where id = 1 or id = 2 or id = 3 or id = 4;
I will get 4 records from the entries which are complete. By complete, I mean that the category, title, time created will all be for the id = 1 or 2 or 3 or 4 in the original table.

mysql> select id, category, title, created from entries where id = 1 or id =2 or id =3 or id = 4;

±—±---------±----------------------------±--------------------+
| id | category | title | created |
±—±---------±----------------------------±--------------------+
| 1 | angst | What If I get Sick and Die? | 2015-04-12 20:57:24 |
| 2 | humor | Uncle Karl and the Gasoline | 2015-04-14 00:08:11 |
| 3 | advice | Be Nice to Me | 2015-04-12 23:09:35 |
| 4 | humor | Hello Statue | 2015-04-14 00:08:39 |
±—±---------±----------------------------±--------------------+

However, if I use the max(created) function to obtain the time at which say, the most recent humor article was created, I do not get the entire record when I enter the following query
mysql> select id, category, title, max(created) from entries;
±—±---------±----------------------------±--------------------+
| id | category | title | max(created) |
±—±---------±----------------------------±--------------------+
| 1 | angst | What If I get Sick and Die? | 2015-04-14 00:54:22 |
±—±---------±----------------------------±--------------------+
1 row in set (0.00 sec)

In this result, the id is 1, the category and the title are from record 1 but the max(created) time is correct but from an entirely different record. In order to get one record, we have to design a query with an inner join of the entries table on itself. Could someone please explain the reasons why sql is designed this way. Clearly, it must useful for more complicated queries and it would be nice to understand the reason.
Thank you very much in advance for your help.

Saleem

Mittineague · April 16, 2015, 7:11am

Hi saleemhasan, welcome to the forum,

I can’t think of anyone better to ask than @r937 himself

saleemhasan · April 16, 2015, 7:19am

Thanks Mittineague,
Hopefully, Rudy will drop by soon
Saleem

molona · April 16, 2015, 7:21am

He lives in Canada so he’ll probably be sleeping right now but it is more than likely that he will read this topic

Ophelie · April 16, 2015, 7:21am

@rakhithanimesh might also be able to help; he’s our Teaching Assistant for Simply SQL on Learnable.

molona · April 16, 2015, 7:57am

@Ophelie it is not that we don’t know the answer to this question is more like it is good to be answered by the writer himself when you have the chance

@saleemhasan I leave it up to you

r937 · April 16, 2015, 10:41am

[quote=“saleemhasan, post:1, topic:133387, full:true”]Could someone please explain the reasons why sql is designed this way.[/quote]sql is designed this way because logic

the reason you get some columns from the id=1 row, but the max(created) value from a different row, is because the sql that you ran is invalid –

select id, category, title, max(created) from entries

here you have three non-aggregate columns and one aggregate column in the SELECT clause

however, you’re missing the GROUP BY clause, and so the entire query is invalid

yes, i know it runs… mysql will happily try to execute even invalid sql

for a full explanation of both why it’s invalid and also why mysql will try to run it, please see http://rpbouman.blogspot.ca/2007/05/debunking-group-by-myths.html

this is a long article, very detailed, but it gives a complete answer

saleemhasan · April 16, 2015, 11:39pm

Hello r937

Thank you very much for your response.

I will definitely read the article you recommend to find out about the issues with non-aggregate columns being included with the aggregate columns.

I must point out that even if I “include group by category” in the select statement, I get the correct (most recent) timestamp for each category. However, the id and title (probably even the category) are not from the same record as the value of max(created). The values of id and title are from the very first record in that category and Not those that are in the record for the correct max(created).

mysql> select id, category, title, max(created) from entries group by category;
±—±----------±----------------------------±--------------------+
| id | category | title | max(created) |
±—±----------±----------------------------±--------------------+
| 3 | advice | Be Nice to Me | 2015-04-12 23:09:35 |
| 1 | angst | What If I get Sick and Die? | 2015-04-12 20:57:24 |
| 6 | computers | Windows Media Center Rocks | 2015-04-14 00:54:22 |
| 2 | humor | Uncle Karl and the Gasoline | 2015-04-14 00:08:39 |
| 5 | science | The Size of Our Galaxy | 2015-04-12 21:09:21 |
±—±----------±----------------------------±--------------------+
5 rows in set (0.02 sec)

Reminds me of the old complaint about computers – How they always give me what I ask for but never what I want

Thank you again for your help.

Saleem

r937 · April 17, 2015, 12:14am

you really need to read that article

saleemhasan · April 19, 2015, 7:48am

Hello r937,

Thank you very much for recommending that wonderful article. I believe I understood quite a bit of it. However, I do have a question.

The way I understood it, the rule is to add to the group by clause, all the columns listed in the select statement that are not functionally dependent on the existing column in the group by clause. For example, in the entries table, if I have a statement
select category, title from entries group by category;

The rule recommends that I add title to the group by clause in order to make it a correct query.
select category, title from entries group by category, title;
However, the resulting set will not group by category.

Why doesn’t it also ask me to check the list of fields in the select statement as one or more may be incompatible (not functionally dependent on the column in the group by clause) with the column in the group by clause. After all, even though correct, the changed query does not give me the desired result.

Thank you again for all your help.

Saleem

r937 · April 19, 2015, 8:20am

[quote=“saleemhasan, post:10, topic:133387, full:true”]Why doesn’t it also ask me to check the list of fields in the select statement …[/quote]i guess because it doesn’t know what you really want

if you want one row per category, but you also want to show a title, it’s really up to you to specify which title you want

system · July 19, 2015, 3:26pm

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.