Dimitri Vorontzov Posted December 7, 2012 Share Posted December 7, 2012 What does GROUP BY clause do, exactly? For the life of me, I can't understand it from the explanation given in the book (Chapter 7, pg. 115): "When the aggregate functions are used with a GROUP BY clause, a single aggregate value will be returned for each row in the result set." That may be not Geek anymore, but it's not quite yet English. How can this be explained using simple Socratic examples? Do I understand it correctly that the GROUP BY clause breaks down into smaller chunks what would otherwise be one larger result? Or let me approach it from a different angle: in what way GROUP BY is different from ORDER BY? Thanks in advance for the explanation, as always! Link to comment Share on other sites More sharing options...
Antonio Conte Posted December 7, 2012 Share Posted December 7, 2012 GROUP BY must often be applied to aggregate functions to get a correct result. The reason for this is because you need to determine what to aggregate on. Let's say we need to count the total goals of football players. Without a group by clause, we'd just count the goals in total. (I actually ran queries this time) With some example data: Player: ( player_id, name) 10, "Alessandro Del Piero" 14, "Mirko Vucinic" 3, "Giorgio Chiellini" player_goal: (match_id, player_id, goal_time) 1, 10, 34 1, 10, 56 1, 14, 90 4, 10, 13 4, 3, 89 Consider this query: (Counting total of goals) SELECT COUNT( * ) AS total, player_id FROM player_goal That query would return 5, 10. The reason for this is that the count is not based on (grouped by) a single player. We therefor get the total number of goals. We also get player_id 10 because it's the first value found. (5 goals by player_id 10) This query however, will count per player: SELECT COUNT(*) as total, player_id FROM goals GROUP BY player_id ORDER BY total DESC Here we will get three rows of result, looking like this. 3, 10 (3 goals for player_id 10) 1, 14 (1 goal for player_id 14) 1, 3 (1 goal for player_id 3) As you can see here, we use ORDER BY to sort on columns. In the above example, I order by total goals descending because it makes sense regarding the result we want. Ordering does not affect the result, only the ordering of it. We could also sort on a second column. This would make more sense if an INNER JOIN to the player table is done. We could then sort goal scorers according to name as an example: Count goals per player, then order by total goals, then player name SELECT name, COUNT(*) AS total FROM goal INNER JOIN player ON ( goal.player_id = player.player_id ) GROUP BY goal.player_id ORDER BY total DESC, name ASC LIMIT 0 , 100 This would return this result: Alessandro Del Piero, 3 (3 goals, player_id 10) Giorgio Chiellini, 1 (1 goals, player_id 3) Mirko Vucinic, 1 (1 goals, player_id 14) I have personally tested and confirmed this on a very similar dataset. 2 Link to comment Share on other sites More sharing options...
Dimitri Vorontzov Posted December 7, 2012 Author Share Posted December 7, 2012 Excellent explanation, thank you very much, Antonio! (Larry, no offence, but that's the way it should have been properly done in the book.) A quick question though, Antonio: in your last query example, you have this: ( goal.player_id = player_id ) Shouldn't it be -- ( goal.player_id = player.player_id ) -- or is it a form of shortcut syntax I'm unfamiliar with? If the latter, could you please briefly explain? Thank you! Link to comment Share on other sites More sharing options...
Antonio Conte Posted December 7, 2012 Share Posted December 7, 2012 Yes, it should be "player.player_id" as you said, not "player.id". A small error there. Thanks for the nice words, btw. Link to comment Share on other sites More sharing options...
Dimitri Vorontzov Posted December 7, 2012 Author Share Posted December 7, 2012 Thanks for clearing this up, Antonio! Link to comment Share on other sites More sharing options...
Recommended Posts