1

One part of my homework assignment is to find the student with the highest average from each department.

QUERY:

SELECT g.sid as studentID, s.sfirstname, s.dcode, AVG(grade) as average
FROM studentgrades g, student s
WHERE g.sid = s.sid
GROUP BY s.sid

RESULT:

1   Robert  ger 80.0000
2   Julie   sta 77.0000
3   Michael csc 84.0000
4   Julia   csc 100.0000
5   Patric  csc 86.0000
6   Jill    sta 74.5000

To answer The question, I ran the query

SELECT dcode, averages.sfirstName, MAX(averages.average)
FROM (
    SELECT g.sid as studentID, s.sfirstname, s.dcode, AVG(grade) as average
    FROM studentgrades g, student s
    WHERE g.sid = s.sid
    GROUP BY s.sid) averages
GROUP BY dcode

RESULT:

csc Michael 100.0000
ger Robert  80.0000
sta Julie   77.0000

Even though the averages are correct, the names are not! Julia is the one who has the average 100 in csc, so why does Michael show up?


Here's an example:

a student takes courses and gets grades for these courses. EG:

student1 from dept1 took course A and got grade 80
student1 from dept1 took course B and got grade 90
student2 from dept1 took course C and got grade 100
student3 from dept2 took course X and got grade 90

AFTER RUNNING THE FIRST QUERY we get the averages for each student

student 1 from dept1 has average 85
student 2 from dept1 has average 100
student 3 from dept2 has average 90

Now we find the student with the highest average from each department

dept1, student2, 100
dept2, student3, 90
6
  • 2
    Which DBMS are you using? Because the query you have is invalid SQL. Commented Nov 4, 2012 at 8:13
  • 1
    See here: rpbouman.blogspot.de/2007/05/debunking-group-by-myths.html to understand why your group by is incorrect. Commented Nov 4, 2012 at 8:34
  • each student is associated with a department (in our case the departments are csc, ger, and sta). For each department we need to print out the name and average grade of the student with the highest average grade. Commented Nov 4, 2012 at 9:03
  • I edited the main question with sample data Commented Nov 4, 2012 at 9:26
  • Thanks, that clears things up (at least for me). You still haven't answered the question which DBMS you are using. Is it really MySQL (as the incorrect SQL seemst to indicate) Commented Nov 4, 2012 at 9:30

3 Answers 3

3

This should do it (and it uses the GROUP BY according to the SQL standard, not the way MySQL implements it)

select s.sid,
       s.sfirstname,
       s.dcode,
       ag.avg_grade
from students s
  join (select sid, avg(grade) as avg_grade
        from studentgrades 
        group by sid) ag on ag.sid = s.sid
  join (select s.dcode,
               max(avg_grade) max_avg_grade
        from students s 
          join (select sid, avg(grade) as avg_grade
                from studentgrades 
                group by sid) ag on ag.sid = s.sid
        group by s.dcode) mag on mag.dcode = s.dcode and mag.max_avg_grade = ag.avg_grade
order by mag.avg_grade;

How this works

This builds up the result in several steps. First it calculates the average grade for each student:

select sid, avg(grade) as avg_grade
from studentgrades 
group by sid

Based on the result of this statement, we can calculate the max. average grade:

select s.dcode,
       max(avg_grade) max_avg_grade
from students s 
  join (select sid, avg(grade) as avg_grade
        from studentgrades 
        group by sid) ag on ag.sid = s.sid
group by s.dcode

Now these two results are joined to the students table. For easier reading assume there is a view called average_grades (the first statement) and max_average_grades (the second one).

The final statement basically does this then:

select s.sid,
       s.sfirstname,
       s.dcode,
       ag.avg_grade
from students s
  join avg_grades ag on ag.sid = s.sid
  join max_avg_grades mag 
    on mag.dcode = s.dcode 
   and mag.max_avg_grade = ag.avg_grade;

The real one (the very first in my answer) simply replaces the names avg_grades and max_avg_grades with the selects I have shown. That's why it looks so complicated.

A solution in standard SQL that is a bit more readable

In standard SQL, this could be expressed using a common table expression which makes it a bit more readable (but is essentially the same thing)

with avg_grades (sid, avg_grade) as (
  select sid, avg(grade) as avg_grade
  from studentgrades 
  group by sid
), 
max_avg_grades (dcode, max_avg_grade) as (
  select s.dcode, max(avg_grade) max_avg_grade
  from students s 
     join avg_grades ag on ag.sid = s.sid
  group by s.dcode
)
select s.sid,
       s.sfirstname,
       s.dcode,
       ag.avg_grade
from students s
  join avg_grades ag on ag.sid = s.sid
  join max_avg_grades mag on mag.dcode = s.dcode and mag.max_avg_grade = ag.avg_grade;

But MySQL is one of the very few DBMS to not support this, so you will need to stick with the initial statement.

A standard SQL solution requiring less derived tables

In standard SQL it could be written even a bit shorter using windowing functions to calculate the rank inside a department (again this does not work in MySQL)

with avg_grades (sid, avg_grade) as (
  select sid, avg(grade) as avg_grade
  from studentgrades 
  group by sid
)
select sid, 
       sfirstname,
       dcode,
       avg_grade
from (       
  select s.sid,
         s.sfirstname,
         s.dcode,
         ag.avg_grade,
         rank() over (partition by s.dcode order by ag.avg_grade desc) as rnk
  from students s
    join avg_grades ag on ag.sid = s.sid
) t
where rnk = 1;
Sign up to request clarification or add additional context in comments.

1 Comment

wow this is great. the only problem I see with what you wrote is that if there is a tie for first place among students, it will return only one of them and not all.
0

Update the query to use a HAVING clause as below:

   SELECT dcode, averages.sfirstName, averages.average
   FROM (
       SELECT g.sid as studentID, s.sfirstname, s.dcode, AVG(grade) as average
       FROM studentgrades g, student s
       WHERE g.sid = s.sid
       GROUP BY s.sid) averages
  GROUP BY dcode
  HAVING MAX(averages.average) = averages.average

5 Comments

The inner statement is invalid because not all non-aggregate columns are listed in the group by.
@a_horse_with_no_name: That is right, but I didn't change because OP mentioned it was working. May be bacause, sid will correspond to one name and dcode only :)
And besides = max(averages.average) is invalid as well.
1. do you mean that it should be GROUP BY studentID, s.sfirstname, s.dcode ? Wouldn't that be wrong, since in the inner query I want to group the average by student id? 2. if max(averages.average) is incorrect, then what is the right syntax?
@indieman: an aggregate function cannot be used in a WHERE condition. It must be used in a having condition (see Yogendra's edit)
0

there are many different solutions. Maybe this one is simpler to understand:

/* create a new temporariy table of student performance. to keep the code clean and the performance better */
INSERT INTO studentperformance (studentID, sfirstname, dcode, average)
SELECT g.sid as studentID
     , s.sfirstname
     , s.dcode
     , AVG(grade) as average
FROM studentgrades g, student s
WHERE g.sid = s.sid
GROUP BY s.sid;

/* best grades for each department */
INSERT INTO bestgrades (best_average_per_department)
SELECT (dcode + '|' + MAX(average)) as best_average_per_department /* important string. maybe one has to cast the max(average) to string for this to work */
FROM studentperformance
GROUP BY dcode; /* important groub by ! */

/* get all the students who are best in each department */
SELECT a.studentID
     , a.sfirstname
     , a.dcode
     , a.average
FROM studentperformance as a
JOIN bestgrades as b on (a.dcode + '|' + a.average) = b.best_average_per_department; 

3 Comments

Why not use two columns for the intermediate bestgrades table? Combining two different values (and of different types as well) into a single column is not very efficient nor good practice
I used this as a hack for "WHERE ... IN" statements in TSQL, so maybe this is why I used it here. It is not the best way for this problem.
You can use multiple columns in a WHERE .. IN statement, e.g. where (a,b) in (select x,y from ..)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.