Jump to content
Larry Ullman's Book Forums

Recommended Posts

What does GROUP BY clause do, exactly?

 

For the life of me, I can't understand it from the explanation given in the book (Chapter 7, pg. 115):

 

"When the aggregate functions are used with a GROUP BY clause, a single aggregate value will be returned for each row in the result set."

 

That may be not Geek anymore, but it's not quite yet English. How can this be explained using simple Socratic examples?

 

Do I understand it correctly that the GROUP BY clause breaks down into smaller chunks what would otherwise be one larger result?

 

Or let me approach it from a different angle: in what way GROUP BY is different from ORDER BY?

 

Thanks in advance for the explanation, as always!

Link to comment
Share on other sites

GROUP BY must often be applied to aggregate functions to get a correct result. The reason for this is because you need to determine what to aggregate on. Let's say we need to count the total goals of football players. Without a group by clause, we'd just count the goals in total. (I actually ran queries this time)

 

With some example data:

 

Player: ( player_id, name)

10, "Alessandro Del Piero"

14, "Mirko Vucinic"

3, "Giorgio Chiellini"

 

player_goal: (match_id, player_id, goal_time)

1, 10, 34

1, 10, 56

1, 14, 90

4, 10, 13

4, 3, 89

 

Consider this query: (Counting total of goals)

SELECT COUNT( * ) AS total, player_id FROM player_goal

 

That query would return 5, 10. The reason for this is that the count is not based on (grouped by) a single player. We therefor get the total number of goals. We also get player_id 10 because it's the first value found. (5 goals by player_id 10)

 

This query however, will count per player:

SELECT COUNT(*) as total, player_id
FROM goals
GROUP BY player_id
ORDER BY total DESC

 

Here we will get three rows of result, looking like this.

3, 10 (3 goals for player_id 10)

1, 14 (1 goal for player_id 14)

1, 3 (1 goal for player_id 3)

 

As you can see here, we use ORDER BY to sort on columns. In the above example, I order by total goals descending because it makes sense regarding the result we want. Ordering does not affect the result, only the ordering of it.

 

We could also sort on a second column. This would make more sense if an INNER JOIN to the player table is done. We could then sort goal scorers according to name as an example:

 

Count goals per player, then order by total goals, then player name

SELECT name, COUNT(*) AS total
FROM goal
INNER JOIN player ON ( goal.player_id = player.player_id )
GROUP BY goal.player_id
ORDER BY total DESC, name ASC
LIMIT 0 , 100

 

This would return this result:

Alessandro Del Piero, 3 (3 goals, player_id 10)

Giorgio Chiellini, 1 (1 goals, player_id 3)

Mirko Vucinic, 1 (1 goals, player_id 14)

 

I have personally tested and confirmed this on a very similar dataset.

  • Upvote 2
Link to comment
Share on other sites

Excellent explanation, thank you very much, Antonio!

 

(Larry, no offence, but that's the way it should have been properly done in the book.)

 

A quick question though, Antonio: in your last query example, you have this:

 

( goal.player_id = player_id )

 

Shouldn't it be --

 

 

( goal.player_id = player.player_id )

 

-- or is it a form of shortcut syntax I'm unfamiliar with? If the latter, could you please briefly explain?

 

Thank you!

Link to comment
Share on other sites

 Share

×
×
  • Create New...